
Senior Sire Reliability Engineer
CertifID
full-time
Posted on:
Location Type: Remote
Location: Texas • United States
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets.
- Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.).
- Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems.
- Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform.
- Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal.
- Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing.
Requirements
- 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
- Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP.
- Strong Linux, networking, and distributed systems troubleshooting skills.
- Strong experience with containers and orchestration (Kubernetes/EKS/AKS).
- Expertise with Infrastructure-as-Code (Terraform strongly preferred).
- Strong scripting/programming skills in Python, Go, Bash, or C#/.NET.
- Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry.
Benefits
- Flexible vacation
- 12 company-paid holidays
- 10 paid sick days
- No work on your birthday
- Health, dental, and vision Insurance (including a $0 option)
- 401(k) with matching, and no waiting period
- Equity
- Life insurance
- Generous parental paid leave
- Wellness reimbursement of $300/year
- Remote worker reimbursement of $300/year
- Professional development reimbursement
- Competitive pay
- An award-winning culture
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREDevOpsPlatform EngineeringInfrastructure EngineeringInfrastructure-as-CodeTerraformLinuxcontainersKubernetesscripting
Soft Skills
mentoringknowledge sharingincident managementblameless postmortems