Support the operational stability of GenAI platforms used across the enterprise
Triaging issues and participating in incident triage and resolution across platform services
Maintain observability tooling to ensure visibility into system performance and reliability
Collaborate with infrastructure teams (e.g., Google Cloud Platform support) to resolve platform level issues
Conduct diagnostics and contribute to root cause analysis for platform incidents
Support internal Gen-AI facing platforms, including Agent Space, ensuring operational stability and performance
Contribute to automation, runbooks and service documentation to improve operational efficiency
Partner with infrastructure and cloud teams to ensure continuity of service
Requirements
4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
4+ years of experience in platform operations, SRE, or infrastructure engineering
2+ years of experience with observability tools (e.g. Prometheus, Grafana, Splunk)
2+ years of experience in incident management and diagnostics in production environments
1+ year of experience supporting internal platforms or services used by engineering or ML teams
1+ year of experience collaborating across geographically distributed teams
Desired: 2+ years of experience working with cloud infrastructure platforms, preferably Google Cloud Platform
Desired: Experience with infrastructure-as-code tools (i.e., Terraform, Ansible)
Desired: Experience with container platforms
Desired: Experience supporting Generative AI environments
Position is not eligible for Visa Sponsorship
Relocation assistance is not available for this position
Must adhere to Wells Fargo risk, compliance, and hiring policies (including prohibition on unauthorized third-party recordings and honest representation of experience)
Benefits
Position offers a hybrid work schedule
Accommodation for applicants with disabilities available upon request
Wells Fargo is an equal opportunity employer
Drug free workplace (Drug and Alcohol Policy)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Software Engineeringplatform operationsSREinfrastructure engineeringobservability toolsincident managementdiagnosticsinfrastructure-as-codecontainer platformsGenerative AI