Senior Site Reliability Engineer, Infrastructure Foundations

Wikimedia Foundation

Senior Site Reliability Engineer with Wikimedia Foundation supporting platform for Wikipedia. Focus on operational tasks, collaboration, and continual improvement of infrastructure reliability.

Posted 5/13/2026full-timeRemote • Arizona, California, Colorado, Connecticut, District of Columbia, Florida, Idaho, Illinois, Iowa, Maryland, Massachusetts, Minnesota, Missouri, Montana, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, Wyoming • 🇺🇸 United StatesSenior💰 $113,082 - $175,725 per yearWebsite

Tech Stack

Tools & technologies

AnsibleGoKubernetesLinuxPuppetPythonRuby

About the role

Key responsibilities & impact

Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform
Work closely with product teams helping them bring scalable functionality to our users by assisting in the architectural design of new services and making them operate at scale
Participating in a 24/7 on-call rotation shared across the broader SRE team. This includes taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure.
Collaborating with a global, cross-functional team in an asynchronous communication environment
Mentoring peers in your areas of technical and operational strength
Ability and willingness to travel 1-2 times a year for in-person events and team meetings

Requirements

What you’ll need

6+ years of experience in an SRE/Operations/DevOps role as part of a team
Experience with shell and any scripting languages used in an SRE context (Python, Go, Bash, Ruby; we primarily use Python) and configuration management tools (Puppet, Ansible; we use Puppet)
Experience designing and managing infrastructure security for large fleets of diverse services
Experience with technical response during security incidents
Experience with package management on Linux systems (we use Debian)
Strong Linux system-level troubleshooting skills
History of automating tasks and processes, identifying process gaps, and finding automation opportunities
Strong English language skills (verbal and written) and ability to work independently, as an effective part of a globally distributed team working across multiple time zones
Experience leading and participating in incident response and post-incident review rituals, with the goal of conducting root cause analysis and implementing preventive measures

Benefits

Comp & perks

Competitive salary
Health insurance
Flexible working hours
Professional development opportunities

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

DevOpsSREPythonGoBashRubyPuppetAnsibleLinuxinfrastructure security

Soft Skills

leadershipmentoringcommunicationcollaborationindependenceproblem-solvingincident responseroot cause analysiscontinuous improvementasynchronous communication