Salary
💰 $90,000 - $110,000 per year
Tech Stack
AWSAzureCloudGoogle Cloud PlatformOraclePostgresSplunkSQL
About the role
- The Team: Service Management is a global team responsible for providing technical application support and cloud infrastructure management for the Markets Group within Enterprise Solutions.
We operate in a highly collaborative environment, working closely with both internal stakeholders and our customer base.
Our team embraces a sharing and learning culture, with a strong focus on continuous improvement and optimizing processes to deliver the highest level of service.
Handle all support requests, including incident, problem, and change management, as well as business continuity activities, ensuring seamless and high-quality delivery of services to end users
Provide second-line, client-facing technical support for issues escalated by first-line support teams. Leverage strong technical skills, business knowledge, and AWS troubleshooting expertise to diagnose and resolve issues efficiently. Partner with development teams for third-line escalations when required.
Collaborate with product and delivery teams to ensure the Service Management team is fully prepared for new releases and actively engaged in the early design and architecture review of cloud-native enhancements.
Drive initiatives and lead continuous improvement processes around proactive AWS infrastructure health monitoring, application performance tracking, and incident prevention strategies to enhance overall system stability and reliability.
Utilize AWS-native tools (e.g., CloudWatch, CloudTrail, and Cost Explorer) and third-party monitoring solutions to strengthen observability, detect anomalies, and implement automated responses.
Apply AI/ML-driven techniques to detect anomaly, predict alerting, and to enhance support operations.
Requirements
- University Graduate of Computer Science or Engineering degree.
Minimum of 3 years of direct experience in Site Reliability Engineering or DevOps roles, high availability, and incident response in AWS or Azure or GCP.
Proficiency with cloud computing environments (AWS / GCP/ Azure).
Good understanding of Application Support processes.
Ideally familiar with monitoring tools such as Splunk, CloudWatch, and Dotcom.
Expertise in Oracle PLSQL/PostgreSQL: Proficiency in advanced SQL techniques, query optimization, and experience with complex database systems.
Experience in leading post-mortem analyses and implementing preventative measures to avoid recurrence of incidents
Excellent problem-solving skills and the capacity to lead effectively under pressure during incident response and outage management.
Ideally would have experience of working in the Finance Industry and/or experience of S&P Global products