Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
PlayOn! Sports

Senior Site Reliability Engineer

PlayOn! Sports

Senior Site Reliability Engineer focused on building tools and automation for system reliability at PlayOn. Collaborating with DevOps and engineering teams to enhance performance and scalability.

Posted 5/11/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesLinuxPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery.
  • Develop automation, tooling, and monitoring solutions to support high service availability.
  • Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing.
  • Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning.
  • Participate in on-call rotations to support critical services and ensure rapid response to incidents.

Requirements

What you’ll need
  • Solid experience in Python, especially for automation, tooling, and data-driven operational tasks.
  • Proficiency in at least one (Java, C++, or Go).
  • Strong understanding of Linux systems, cloud infrastructure (AWS, GCP, or Azure), and modern deployment practices (Docker, Kubernetes, Terraform).
  • Experience with CI/CD pipelines, version control, and automated testing frameworks.
  • Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.) and log/metric analysis for diagnosing issues.
  • Proven experience facilitating and documenting Critical User Journeys translating them to actionable SLA/SLO for automation.
  • Demonstrated ability to collaborate with cross-functional teams and communicate clearly in high-impact situations.
  • A problem-solver who approaches reliability as a shared responsibility across engineering.
  • Familiarity with AI-augmented development tools (Claude, Codex) as part of a modern engineering workflow.
  • **Nice to Have**
  • Experience writing or maintaining end-to-end or integration tests for distributed systems.
  • Background in performance testing, capacity planning, or chaos engineering.
  • Contributions to internal developer tooling or reliability-focused frameworks.
  • Exposure to security, compliance, or change management processes in production environments.
  • Relevant certifications.

Benefits

Comp & perks
  • Multiple medical insurance plans to choose from
  • Dental, vision life and disability insurance
  • Employee Emergency Fund
  • Company equity (stock options)
  • Open PTO policy
  • 401K plan with company match
  • Hybrid/flexible work environment

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonJavaC++GoLinux systemsAWSGCPAzureDockerKubernetes
Soft Skills
collaborationcommunicationproblem-solvingoperational excellenceincident preventionblameless postmortemscapacity planningfacilitatingdocumentingtranslating