Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
UJET

Senior Site Reliability Engineer

UJET

Senior Site Reliability Engineer improving system reliability and establishing best practices at AI-driven contact center UJET. Leading incident response and mentoring engineers for operational maturity.

Posted 4/20/2026full-timeRemote • Texas • 🇺🇸 United StatesSenior💰 $100,000 - $120,000 per yearWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformJavaPython

About the role

Key responsibilities & impact
  • Lead efforts to improve system reliability, scalability, and performance across critical services
  • Define and implement SLIs/SLOs and error budgets, and use them to guide engineering priorities
  • Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise
  • Lead complex incident response, acting as incident commander when needed
  • Conduct postmortems focused on systemic causes rather than individual fault, and ensure corrective actions from those reviews are completed.
  • Identify and eliminate toil through automation, tooling, and improved workflows
  • Partner with product and platform teams on architecture decisions, production readiness, and designing systems that recover from failure
  • Build reusable systems and “paved roads” that make it easier for teams to operate their services reliably
  • Mentor other engineers and raise the overall operational maturity of the organization

Requirements

What you’ll need
  • 6 - 10+ years of experience in SRE, infrastructure, or backend systems engineering
  • Demonstrated experience of owning reliability outcomes for complex, distributed systems
  • Strong experience with cloud infrastructure (AWS, GCP, or Azure) and production-scale systems
  • Deep understanding of observability, incident management, and system performance
  • Proficiency in at least one programming language (e.g., Go, Python, Java) with a focus on automation and tooling
  • Able to change how other teams work without having managerial authority over them
  • Strong competency in making clear decisions during incidents by following a defined process without reacting emotionally.

Benefits

Comp & perks
  • Medical
  • Dental
  • Vision
  • 401(k) plan
  • Commuter benefits
  • Comprehensive Benefits

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREinfrastructure engineeringbackend systems engineeringcloud infrastructureAWSGCPAzureobservabilityincident managementprogramming language
Soft Skills
mentoringdecision makingincident responsecollaborationleadershipcommunicationproblem solvingprocess adherenceorganizational maturitychange management