CertifID

Senior Sire Reliability Engineer

CertifID

full-time

Posted on:

Location Type: Remote

Location: TexasUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets.
  • Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.).
  • Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems.
  • Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform.
  • Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal.
  • Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing.

Requirements

  • 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
  • Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP.
  • Strong Linux, networking, and distributed systems troubleshooting skills.
  • Strong experience with containers and orchestration (Kubernetes/EKS/AKS).
  • Expertise with Infrastructure-as-Code (Terraform strongly preferred).
  • Strong scripting/programming skills in Python, Go, Bash, or C#/.NET.
  • Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry.
Benefits
  • Flexible vacation
  • 12 company-paid holidays
  • 10 paid sick days
  • No work on your birthday
  • Health, dental, and vision Insurance (including a $0 option)
  • 401(k) with matching, and no waiting period
  • Equity
  • Life insurance
  • Generous parental paid leave
  • Wellness reimbursement of $300/year
  • Remote worker reimbursement of $300/year
  • Professional development reimbursement
  • Competitive pay
  • An award-winning culture
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREDevOpsPlatform EngineeringInfrastructure EngineeringInfrastructure-as-CodeTerraformLinuxcontainersKubernetesscripting
Soft Skills
mentoringknowledge sharingincident managementblameless postmortems