FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer
Stack AVStack AV Site Reliability Engineer managing large-scale autonomous systems development and infrastructure performance. Collaborating across teams to enhance reliability, scalability, and automation of compute platforms.
Tech Stack
Tools & technologiesCloudKubernetesLinuxOpen SourcePrometheusTCP/IP
About the role
Key responsibilities & impact- Instrument systems scheduling and executing large-scale batch workloads across Kubernetes clusters.
- Diagnose and triage job failures for customers.
- Collaborate with teams across the company to understand workload requirements and improve platform capabilities.
- Scale the reliability and velocity of our systems and processes through increased automation.
- Document actions to build a comprehensive library of runbooks, which will act as a knowledge base and foundation for automation.
- Participate in an on-call rotation to uphold the SLOs and SLAs of production services.
- Contribute to platform tooling, automation, and CI/CD workflows.
Requirements
What you’ll need- Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems.
- Strong experience with Kubernetes and container orchestration in production grade environments.
- Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget.
- Strong experience implementing and debugging cloud native and open source tools such as Kubernetes, etcd, Prometheus, OpenTelemetry.
- Strong communication skills and the ability to work effectively in a diverse and distributed team.
Benefits
Comp & perks- We are proud to be an equal opportunity workplace.
- We believe that diverse teams produce the best ideas and outcomes.
- We are committed to building a culture of inclusion, entrepreneurship, and innovation across gender, race, age, sexual orientation, religion, disability, and identity.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesLinuxTCP/IP networkingcloud native toolscontainer orchestrationautomationCI/CD workflowsetcdPrometheusOpenTelemetry
Soft Skills
strong communication skillscollaborationproblem-solvingguidanceteamwork