Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Wells Fargo

Lead Systems Operations Engineer

Wells Fargo

. Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams .

Posted 5/22/2026full-timeChandler • Arizona, North Carolina, Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
GrafanaLinuxOpenShiftPrometheusPythonRedisSplunk

About the role

Key responsibilities & impact
  • Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
  • Lead day‑to‑day Platform (REDIS, OpenShift) platform operations, including cluster maintenance, upgrades, performance monitoring, and troubleshooting
  • Improving operations practices to meet new Incident SLA and improving practices during incident & problem management
  • Serve as an operational lead during incidents, driving rapid diagnosis, resolution, root‑cause analysis, and long‑term corrective actions
  • Develop or enhance automation (Python, Bash, GitOps workflows, or AI‑assisted tools), build AI Agents, MCP server and tools, add skill in MCP that eliminates manual effort and streamlines run processes
  • Lead Platform lifecycle activities, including new cluster builds, configuration, onboarding, upgrades, and cluster decommissioning, ensuring consistency, reliability, and compliance across environments
  • Partner with engineering, SRE, security, and development teams to implement repeatable operational patterns, guardrails, and platform readiness standards
  • Ensure platform operations follow organizational policies, security standards, audit controls, and regulatory requirements
  • Identify operational gaps, recurring issues, or inefficiencies and lead initiatives to enhance reliability, resiliency, and operational maturity.

Requirements

What you’ll need
  • 5+ years of Systems Engineering, equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of hands-on experience in Python for platform operations automation
  • 5 + years of designing and building complex observability solutions leveraging industry standard toolset and or custom-built solutions
  • Strong proficiency in writing production-quality Python code by using Python libraries and client integrations
  • Ability to develop automation solutions, including remediation procedures & workflows and operational tools using Python
  • 3+ years of experience managing complex, enterprise-scale applications in production environments
  • Extensive experience with configuration and monitoring tools such as Grafana, Splunk, and Prometheus
  • Deep platform expertise, including cluster build-outs, CI/CD pipeline integration, troubleshooting, debugging, remediation, patching, upgrades, and root cause analysis (RCA)
  • 2+ years of hands-on Linux system administration experience

Benefits

Comp & perks
  • Participation in on-call rotations

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonBashGitOpsAI-assisted toolsMCP serverobservability solutionsproduction-quality codeLinux system administrationtroubleshootingroot cause analysis
Soft Skills
leadershipproblem managementcommunicationcollaborationoperational maturity