Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
PNC

Software Engineering Manager – Site Reliability Center

PNC

Software Engineering Manager leading Site Reliability Engineering initiatives for PNC, focusing on operational excellence and team development.

Posted 6/26/2026full-timePittsburgh • Alabama, Arizona, Colorado, Pennsylvania, Texas • 🇺🇸 United StatesMid-LevelSenior💰 $100,100 - $204,490 per yearWebsite

Tech Stack

Tools & technologies
CassandraCloudElasticSearchETLKafkaLinuxMongoDBOracleRedisSQL

About the role

Key responsibilities & impact
  • Manage SRE and related Teams; lead, coach, and develop a team of SRE engineers; set clear goals, drive accountability, and foster a culture of ownership and excellence; partner with cross-functional stakeholders to align technology and business objectives; support talent development, performance management, and succession planning; encourage innovation, continuous learning, and DevOps/SRE best practices.
  • Lead incident management & remediation; manage and actively participate in end-to-end incident response for major (P1/P2) incidents; guide real-time triage, diagnostics, and troubleshooting across application, infrastructure, and network layers; ensure rapid execution of remediation actions and service restoration; provide clear, timely communication to stakeholders during incidents; oversee post-incident analysis, reporting, and documentation to drive improvements.
  • Provide technical leadership in production support; serve as an escalation point for complex production issues; guide troubleshooting across: applications, infrastructure (Linux/Windows), databases (Oracle, SQL), middleware and integrations; ensure efficient log, metric, and system analysis; oversee batch/ETL monitoring and recovery processes; foster strong collaboration across engineering, infrastructure, and vendor teams.
  • Drive problem management & root cause resolution; lead root cause analysis (RCA) efforts for major and recurring incidents; ensure ownership and resolution of problem records; drive permanent fixes and systemic improvements to eliminate repeat issues, identify trends and patterns to reduce risk and improve stability; partner with engineering teams to resolve code defects and system gaps and promote knowledge sharing via runbooks, knowledge articles, and error catalogs.
  • Oversee change management & release execution; ensure safe and compliant execution of production changes and releases; validate change readiness, testing, rollback strategies, and risk assessments; represent the team in CAB reviews, providing technical risk evaluation; oversee post-implementation reviews (CPIR) and ensure follow-through and drive improvements in change success rate and reduction in production defects.
  • Advance monitoring, alerting & observability; lead efforts to build and optimize monitoring, dashboards, and alerting frameworks, champion use of tools such as Dynatrace, BigPanda, Logscale, and enterprise platforms, improve signal-to-noise ratio through alert tuning; enable proactive issue detection before customer impact; strengthen event management and observability practices.
  • Champion resiliency, stability & availability; lead efforts to ensure high availability of critical systems; oversee disaster recovery, failover, and continuity testing; identify and eliminate single points of failure and drive improvements in MTTR, uptime, and service reliability.
  • Enable scalability & performance optimization; guide capacity planning and performance tuning strategies; ensure systems scale effectively under peak demand; partner with development teams for performance-driven design improvements; optimize system configurations to improve efficiency and throughput.
  • Lead a 24x7 production support model; manage team participation in a 24x7 on-call rotation; oversee engagement in incident bridges, war rooms, and escalations; support pod-based operating models aligned to key applications; ensure seamless handoffs and global support continuity.
  • Drive Automation & Operational Efficiency; identify and prioritize opportunities to reduce manual effort through automation; implement automation across: Incident remediation, monitoring and alerting, deployment and validation, promote standardized runbooks and automation frameworks and improve operational metrics and reduce toil.
  • Ensure Governance, Risk & Compliance; maintain adherence to enterprise policies and regulatory standards; support audits, vulnerability remediation, and risk controls; ensure accurate documentation and operational procedures and champion security, access management, and data governance practices.

Requirements

What you’ll need
  • 5 + years of related experience and 3+ years of management experience.
  • Strong experience in Site Reliability Engineering, Production Support, or DevOps.
  • Proven ability to lead teams in high-availability, enterprise environments
  • Deep understanding of incident, problem, and change management frameworks
  • Hands-on knowledge of monitoring tools, cloud/infrastructure platforms, and automation
  • Experience improving system reliability, observability, and operational maturity
  • Strong communication skills with the ability to lead during high-pressure situations.
  • Experience with OCP under infrastructure (Linux/Windows, OCP), MongoDB, Cassandra under databases (Oracle, SQL, MongoDB, Cassandra) and working knowledge of Elasticsearch, Redis, MQ and Kafka is a plus.

Benefits

Comp & perks
  • medical/prescription drug coverage (with a Health Savings Account feature)
  • dental and vision options
  • employee and spouse/child life insurance
  • short and long-term disability protection
  • 401(k) with PNC match
  • pension and stock purchase plans
  • dependent care reimbursement account
  • back-up child/elder care
  • adoption, surrogacy, and doula reimbursement
  • educational assistance, including select programs fully paid
  • a robust wellness program with financial incentives

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringProduction SupportDevOpsIncident ManagementProblem ManagementChange ManagementMonitoringAutomationPerformance OptimizationCapacity Planning
Soft Skills
LeadershipCoachingCommunicationCollaborationAccountabilityInnovationContinuous LearningCrisis ManagementTalent DevelopmentPerformance Management