The role encompasses 2nd line technical application support & Cloud Infrastructure Management for Markets group of Enterprise Solutions.
Act as a strategic technology partner to Architecture, Engineering, Business Systems, and Global Service Delivery (L1/L2/L3), ensuring enterprise-grade, resilient, and scalable IT services aligned to business outcomes.
Establish and lead a collaborative service excellence culture, driving standardized, repeatable, and cost-efficient operational processes with a strong focus on quality, reliability, and continuous improvement.
Own and govern the Major Incident Management lifecycle, from fault detection and triage through resolution, executive communication, post-incident reviews, and sustainable Root Cause remediation.
Lead service performance reviews with business and technology stakeholders, identifying systemic improvement opportunities, operational risks, and reliability enhancements.
Provide overall accountability for people leadership, including talent strategy, recruitment, onboarding, performance management, career development, and succession planning for Service Management and SRE teams.
Define and evolve enterprise-level observability and reliability frameworks, covering metrics, logs, traces, SLIs/SLOs, and error budgets across hybrid and cloud platforms.
Own Disaster Recovery, resiliency strategy, and operational readiness, ensuring regular testing, executive assurance, and continuous enhancement of recovery capabilities.
Serve as a senior technical leader and mentor, guiding SREs, DevOps, and engineering teams while driving adoption of best practices across reliability engineering and operations.

Requirements

Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers.
Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams.
Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle.
Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity.
Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention.
Influence and support technology governance, risk management, compliance, and audit activities related to service reliability.
Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact.
Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication.
Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution.
Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation.
Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations.
Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change.
Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable.
Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence.
Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions.
Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale.

Benefits

Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Cloud Infrastructure ManagementIncident ManagementProblem ManagementChange ManagementBusiness ContinuityAIOpsService Reliability Engineering (SRE)Disaster RecoveryOperational ReadinessMonitoring and Alerting

Soft Skills

LeadershipCollaborationCommunicationContinuous ImprovementStrategic PartnershipTalent ManagementExecutive CommunicationProblem SolvingStakeholder ManagementMentoring