S&P Global

Lead, IT Service Operations

S&P Global

full-time

Posted on:

Location Type: Hybrid

Location: DallasNorth CarolinaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $140,000 - $150,000 per year

Job Level

Tech Stack

About the role

  • The role encompasses 2nd line technical application support & Cloud Infrastructure Management for Markets group of Enterprise Solutions.
  • Act as a strategic technology partner to Architecture, Engineering, Business Systems, and Global Service Delivery (L1/L2/L3), ensuring enterprise-grade, resilient, and scalable IT services aligned to business outcomes.
  • Establish and lead a collaborative service excellence culture, driving standardized, repeatable, and cost-efficient operational processes with a strong focus on quality, reliability, and continuous improvement.
  • Own and govern the Major Incident Management lifecycle, from fault detection and triage through resolution, executive communication, post-incident reviews, and sustainable Root Cause remediation.
  • Lead service performance reviews with business and technology stakeholders, identifying systemic improvement opportunities, operational risks, and reliability enhancements.
  • Provide overall accountability for people leadership, including talent strategy, recruitment, onboarding, performance management, career development, and succession planning for Service Management and SRE teams.
  • Define and evolve enterprise-level observability and reliability frameworks, covering metrics, logs, traces, SLIs/SLOs, and error budgets across hybrid and cloud platforms.
  • Own Disaster Recovery, resiliency strategy, and operational readiness, ensuring regular testing, executive assurance, and continuous enhancement of recovery capabilities.
  • Serve as a senior technical leader and mentor, guiding SREs, DevOps, and engineering teams while driving adoption of best practices across reliability engineering and operations.

Requirements

  • Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers.
  • Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams.
  • Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle.
  • Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity.
  • Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention.
  • Influence and support technology governance, risk management, compliance, and audit activities related to service reliability.
  • Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact.
  • Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication.
  • Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution.
  • Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation.
  • Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations.
  • Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change.
  • Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable.
  • Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence.
  • Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions.
  • Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale.
Benefits
  • Health & Wellness: Health care coverage designed for the mind and body.
  • Flexible Downtime: Generous time off helps keep you energized for your time on.
  • Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
  • Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
  • Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
  • Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Cloud Infrastructure ManagementIncident ManagementProblem ManagementChange ManagementBusiness ContinuityAIOpsService Reliability Engineering (SRE)Disaster RecoveryOperational ReadinessMonitoring and Alerting
Soft Skills
LeadershipCollaborationCommunicationContinuous ImprovementStrategic PartnershipTalent ManagementExecutive CommunicationProblem SolvingStakeholder ManagementMentoring