FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
PlayOn! SportsSenior Site Reliability Engineer focused on building tools and automation for system reliability at PlayOn. Collaborating with DevOps and engineering teams to enhance performance and scalability.
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
About the role
Key responsibilities & impact- Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery.
- Develop automation, tooling, and monitoring solutions to support high service availability.
- Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing.
- Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning.
- Participate in on-call rotations to support critical services and ensure rapid response to incidents.
Requirements
What you’ll need- Solid experience in Python, especially for automation, tooling, and data-driven operational tasks.
- Proficiency in at least one (Java, C++, or Go).
- Strong understanding of Linux systems, cloud infrastructure (AWS, GCP, or Azure), and modern deployment practices (Docker, Kubernetes, Terraform).
- Experience with CI/CD pipelines, version control, and automated testing frameworks.
- Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.) and log/metric analysis for diagnosing issues.
- Proven experience facilitating and documenting Critical User Journeys translating them to actionable SLA/SLO for automation.
- Demonstrated ability to collaborate with cross-functional teams and communicate clearly in high-impact situations.
- A problem-solver who approaches reliability as a shared responsibility across engineering.
- Familiarity with AI-augmented development tools (Claude, Codex) as part of a modern engineering workflow.
- **Nice to Have**
- Experience writing or maintaining end-to-end or integration tests for distributed systems.
- Background in performance testing, capacity planning, or chaos engineering.
- Contributions to internal developer tooling or reliability-focused frameworks.
- Exposure to security, compliance, or change management processes in production environments.
- Relevant certifications.
Benefits
Comp & perks- Multiple medical insurance plans to choose from
- Dental, vision life and disability insurance
- Employee Emergency Fund
- Company equity (stock options)
- Open PTO policy
- 401K plan with company match
- Hybrid/flexible work environment
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonJavaC++GoLinux systemsAWSGCPAzureDockerKubernetes
Soft Skills
collaborationcommunicationproblem-solvingoperational excellenceincident preventionblameless postmortemscapacity planningfacilitatingdocumentingtranslating