Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Northern Trust

Senior Principal Infrastructure Services – SRE Practice

Northern Trust

Senior Principal Infrastructure Services at Northern Trust, guiding SRE principles and enhancing system reliability in Pune or Bangalore. Overseeing diverse engineering initiatives to ensure operational excellence.

Posted 6/17/2026full-timePune • 🇮🇳 IndiaSeniorWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsGoJavaPythonRuby

About the role

Key responsibilities & impact
  • Lead the design and evolution of highly reliable, scalable, and performant distributed systems, applying SRE principles across infrastructure and application layers.
  • Partner with engineering and architecture teams to influence system design decisions that improve resilience, fault tolerance, and operational simplicity.
  • Define and promote reliability patterns, architectural best practices, and non-functional requirements aligned with business criticality.
  • Drive an automation-first approach by designing and developing tools, scripts, and platforms that reduce manual effort, operational toil, and human error.
  • Embed reliability engineering into the software delivery lifecycle through CI/CD integration, safe deployments, and repeatable operational workflows.
  • Establish clear operational metrics and service health indicators to ensure transparency and accountability.
  • Participate in and lead incident response for production systems, ensuring timely mitigation and minimal customer or business impact.
  • Conduct and drive blameless post-incident reviews, focusing on identifying systemic causes rather than individual faults.
  • Implement long-term corrective actions to prevent recurrence and measurably improve system reliability.
  • Architect and implement end-to-end observability across systems using metrics, logs, and traces to enable rapid diagnosis and proactive issue detection.
  • Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to balance reliability with feature velocity.
  • Build and maintain actionable dashboards and alerts that provide real-time insights into system health, performance, and risk.
  • Identify reliability gaps through data analysis, failure reviews, and resilience testing, driving targeted improvement initiatives.
  • Lead efforts such as capacity planning, load testing, chaos engineering, and fault injection to validate system behavior under stress.
  • Create and maintain clear, accurate, and actionable documentation including system architectures, runbooks, operational standards, and incident playbooks.
  • Work closely with product, development, platform, security, and operations teams to embed SRE principles into roadmap planning and delivery.
  • Act as a trusted advisor, translating reliability data and operational risk into business-relevant insights for technical and non-technical stakeholders.
  • Manage and prioritize multiple reliability-focused initiatives, balancing short-term operational needs with long-term system health.

Requirements

What you’ll need
  • Bachelor’s degree in Computer Science, Engineering, or a related discipline, or equivalent practical experience demonstrating advanced technical and leadership capabilities.
  • 15+ years of progressive experience in systems engineering with a strong emphasis on site reliability, large-scale systems operations, and software engineering in complex enterprise or cloud environments.
  • 7+ years of experience in a technical leadership role (Team Lead or Hands-on Technical Manager), with a proven track record of driving cross-functional initiatives and delivering complex projects to successful completion.
  • Strong proficiency in one or more modern programming languages such as Python, Go, Java, Ruby, or equivalent, with a software-engineering mindset applied to operational challenges.
  • Demonstrated experience operating and supporting systems across hybrid environments, including both on-premises infrastructure and public/private cloud platforms.
  • Hands-on experience with containerization and container orchestration technologies, enabling scalable, resilient, and repeatable deployments.
  • Proven ability to design and implement observability solutions, including metrics, logs, traces, dashboards, and alerts that provide actionable insights into system health and performance.
  • Deep understanding of distributed systems, networking fundamentals, failure modes, and modern software architectures, with the ability to reason about complex system behaviors under load or failure conditions.
  • Exceptional problem-solving skills with the ability to diagnose, mitigate, and permanently resolve complex, high-impact technical issues.
  • Strong customer and stakeholder orientation, with excellent communication skills and the ability to articulate complex reliability strategies clearly and persuasively to both technical and non-technical audiences.
  • Prior experience designing and delivering Infrastructure as Code (IaC) through automated CI/CD pipelines, ensuring consistency, scalability, and reliability of infrastructure changes.
  • Demonstrated success in mentoring, coaching, and developing high-performing technical teams, fostering a culture of engineering excellence, ownership, and continuous improvement.
  • Hands-on expertise in implementing automated remediation and corrective actions driven by observability signals and reliability metrics.
  • Practical experience working within Agile and DevOps environments, collaborating closely with product and engineering teams to balance reliability, velocity, and innovation.

Benefits

Comp & perks
  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonGoJavaRubycontainerizationcontainer orchestrationInfrastructure as Code (IaC)CI/CDobservability solutionsdistributed systems
Soft Skills
problem-solvingcommunicationmentoringcoachingleadershipstakeholder orientationcross-functional collaborationengineering excellencecontinuous improvementtranslating technical data