FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal Site Reliability Engineer
WalmartPrincipal Site Reliability Engineer at Walmart leading design and implementation of reliability programs. Focused on system performance, scalability, and disaster recovery for complex site environments.
Posted 5/27/2026full-timeBentonville • California • 🇺🇸 United StatesLead💰 $110,000 - $220,000 per yearWebsite
Tech Stack
Tools & technologiesCloudDockerJavaScriptPython
About the role
Key responsibilities & impact- Design and develop reliability programs tailored to complex site environments, ensuring alignment with business goals and site safety engineering.
- Lead and facilitate reliability testing and chaos experiments to validate application resiliency and system performance.
- Analyze system architecture and performance to optimize scalability, disaster recovery, and operational efficiency.
- Develop and implement monitoring strategies, establishing metrics and alerts to maintain system availability and reliability.
- Guide root cause analysis efforts to identify and resolve defects, enhancing application stability and preventing incidents.
- Drive infrastructure automation and telemetry integration to support continuous delivery and operational excellence.
- Mentor team members on tools, coding standards, and reliability best practices.
Requirements
What you’ll need- Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 5 years’ experience in site reliability engineering, site and system administration, infrastructure management, or related area.
- Option 2: 7 years’ experience in site reliability engineering, site and system administration, infrastructure management, or related area.
- Extensive experience in site reliability engineering with strong expertise in system monitoring, root cause analysis, and reliability analysis.
- Proficiency in designing scalable, modular, and extensible software architectures aligned with business and technical requirements.
- In-depth knowledge of disaster recovery planning, execution, and contingency procedures for complex site environments.
- Skilled in cloud computing platforms and containerization technologies such as Docker.
- Strong coding skills in languages like JavaScript and Python, with automation experience in CI/CD pipelines.
- Proven capability to analyze system performance and implement telemetry for continuous improvement.
Benefits
Comp & perks- Health benefits include medical, vision and dental coverage.
- Financial benefits include 401(k), stock purchase and company-paid life insurance.
- Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
- Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
- Live Better U is a Walmart-paid education benefit program covering tuition, books, and fees for educational programs up to a bachelor's degree.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
site reliability engineeringsystem monitoringroot cause analysisreliability analysisdisaster recovery planningJavaScriptPythonCI/CD pipelinescloud computingcontainerization
Soft Skills
mentoringleadershipcommunicationproblem-solvingcollaboration
Certifications
Bachelor's degree in computer scienceBachelor's degree in computer engineeringBachelor's degree in computer information systemsBachelor's degree in software engineering