Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tango

Senior Site Reliability Engineer

Tango

Senior Site Reliability Engineer at Tango Analytics focusing on cloud platform reliability and scalability in a fully remote role. Collaborate with engineering teams to implement observability and incident management practices.

Posted 4/29/2026full-timeRemote • California • 🇺🇸 United StatesSenior💰 $150,000 - $180,000 per yearWebsite

Tech Stack

Tools & technologies
AnsibleAWSAzureCloudDNSDockerGoGoogle Cloud PlatformGrafanaJavaJenkinsKubernetesLinuxPrometheusPythonSplunkTCP/IPTerraform

About the role

Key responsibilities & impact
  • Own reliability outcomes for Tango’s cloud platform (availability, latency, performance, and scalability) across production and non-production environments
  • Design, implement, and operate SLOs/SLIs, error budgets, and reliability reporting; drive prioritization of reliability work with Engineering and Product
  • Build and maintain observability foundations: metrics, logging, tracing, dashboards, and alerting that are actionable and reduce noise
  • Lead incident response and post-incident reviews (blameless RCAs); implement remediation and prevention work to measurably reduce repeat incidents
  • Engineer and evolve CI/CD and release safety practices (progressive delivery, canary/blue-green, automated rollbacks, change controls)
  • Improve infrastructure-as-code and environment consistency; standardize and harden platform components
  • Partner with Security and Compliance to support secure operations, vulnerability remediation, audits, and customer trust requirements
  • Optimize cloud cost and capacity through right-sizing, autoscaling, and performance tuning; track and report on cost drivers
  • Enable engineering teams with reliable internal tooling, runbooks, and self-service operational capabilities
  • Mentor engineers on reliability best practices, operational excellence, and automation

Requirements

What you’ll need
  • 8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering supporting distributed SaaS applications
  • Strong background in Linux systems engineering, networking fundamentals (TCP/IP, DNS, load balancing), and troubleshooting in production
  • Proficiency with at least one programming language used for automation (e.g., Python, Go, or Java) and strong scripting skills
  • Hands-on experience with cloud infrastructure (AWS, Azure, or GCP)
  • Deep experience with infrastructure-as-code and configuration management (e.g., Terraform, CloudFormation, Ansible)
  • Expertise in containerization and orchestration (Docker, Kubernetes) and operating cloud-native services
  • Strong observability practice with tools such as Prometheus/Grafana, Datadog, New Relic, OpenTelemetry, ELK/Splunk, or equivalent
  • Demonstrated incident management leadership, root cause analysis, and continuous improvement mindset
  • Experience designing and operating CI/CD pipelines and release management practices (e.g., GitHub Actions, Jenkins, GitLab CI, ArgoCD)
  • Ability to work cross-functionally with Engineering, Product, Support, and Security; clear written and verbal communication
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
  • Relevant certifications are a plus (e.g., AWS/Azure/GCP, Kubernetes CKA/CKAD, ITIL, or security-focused certifications)

Benefits

Comp & perks
  • Competitive Compensation
  • Comprehensive Benefits Including health, dental, and vision insurance
  • 401(k) plan with company match
  • Generous paid time off
  • Flexible Work Environment
  • Inclusive & Collaborative Culture

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsProduction EngineeringLinux systems engineeringNetworking fundamentalsAutomation programming (Python, Go, Java)Infrastructure-as-codeContainerizationOrchestration (Docker, Kubernetes)CI/CD pipelines
Soft Skills
Incident management leadershipRoot cause analysisContinuous improvement mindsetCross-functional collaborationClear communication
Certifications
AWS certificationAzure certificationGCP certificationKubernetes CKAKubernetes CKADITIL certificationSecurity-focused certifications