
Lead, IT Service Operations
S&P Global
full-time
Posted on:
Location Type: Hybrid
Location: Dallas • North Carolina • Texas • United States
Visit company websiteExplore more
Salary
💰 $140,000 - $150,000 per year
Job Level
Tech Stack
About the role
- The role encompasses 2nd line technical application support & Cloud Infrastructure Management for Markets group of Enterprise Solutions.
- Act as a strategic technology partner to Architecture, Engineering, Business Systems, and Global Service Delivery (L1/L2/L3), ensuring enterprise-grade, resilient, and scalable IT services aligned to business outcomes.
- Establish and lead a collaborative service excellence culture, driving standardized, repeatable, and cost-efficient operational processes with a strong focus on quality, reliability, and continuous improvement.
- Own and govern the Major Incident Management lifecycle, from fault detection and triage through resolution, executive communication, post-incident reviews, and sustainable Root Cause remediation.
- Lead service performance reviews with business and technology stakeholders, identifying systemic improvement opportunities, operational risks, and reliability enhancements.
- Provide overall accountability for people leadership, including talent strategy, recruitment, onboarding, performance management, career development, and succession planning for Service Management and SRE teams.
- Define and evolve enterprise-level observability and reliability frameworks, covering metrics, logs, traces, SLIs/SLOs, and error budgets across hybrid and cloud platforms.
- Own Disaster Recovery, resiliency strategy, and operational readiness, ensuring regular testing, executive assurance, and continuous enhancement of recovery capabilities.
- Serve as a senior technical leader and mentor, guiding SREs, DevOps, and engineering teams while driving adoption of best practices across reliability engineering and operations.
Requirements
- Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers.
- Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams.
- Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle.
- Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity.
- Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention.
- Influence and support technology governance, risk management, compliance, and audit activities related to service reliability.
- Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact.
- Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication.
- Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution.
- Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation.
- Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations.
- Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change.
- Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable.
- Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence.
- Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions.
- Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale.
Benefits
- Health & Wellness: Health care coverage designed for the mind and body.
- Flexible Downtime: Generous time off helps keep you energized for your time on.
- Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
- Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
- Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
- Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Cloud Infrastructure ManagementIncident ManagementProblem ManagementChange ManagementBusiness ContinuityAIOpsService Reliability Engineering (SRE)Disaster RecoveryOperational ReadinessMonitoring and Alerting
Soft Skills
LeadershipCollaborationCommunicationContinuous ImprovementStrategic PartnershipTalent ManagementExecutive CommunicationProblem SolvingStakeholder ManagementMentoring