FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
TangoSenior Site Reliability Engineer at Tango Analytics focusing on cloud platform reliability and scalability in a fully remote role. Collaborate with engineering teams to implement observability and incident management practices.
Posted 4/29/2026full-timeRemote • California • 🇺🇸 United StatesSenior💰 $150,000 - $180,000 per yearWebsite
Tech Stack
Tools & technologiesAnsibleAWSAzureCloudDNSDockerGoGoogle Cloud PlatformGrafanaJavaJenkinsKubernetesLinuxPrometheusPythonSplunkTCP/IPTerraform
About the role
Key responsibilities & impact- Own reliability outcomes for Tango’s cloud platform (availability, latency, performance, and scalability) across production and non-production environments
- Design, implement, and operate SLOs/SLIs, error budgets, and reliability reporting; drive prioritization of reliability work with Engineering and Product
- Build and maintain observability foundations: metrics, logging, tracing, dashboards, and alerting that are actionable and reduce noise
- Lead incident response and post-incident reviews (blameless RCAs); implement remediation and prevention work to measurably reduce repeat incidents
- Engineer and evolve CI/CD and release safety practices (progressive delivery, canary/blue-green, automated rollbacks, change controls)
- Improve infrastructure-as-code and environment consistency; standardize and harden platform components
- Partner with Security and Compliance to support secure operations, vulnerability remediation, audits, and customer trust requirements
- Optimize cloud cost and capacity through right-sizing, autoscaling, and performance tuning; track and report on cost drivers
- Enable engineering teams with reliable internal tooling, runbooks, and self-service operational capabilities
- Mentor engineers on reliability best practices, operational excellence, and automation
Requirements
What you’ll need- 8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering supporting distributed SaaS applications
- Strong background in Linux systems engineering, networking fundamentals (TCP/IP, DNS, load balancing), and troubleshooting in production
- Proficiency with at least one programming language used for automation (e.g., Python, Go, or Java) and strong scripting skills
- Hands-on experience with cloud infrastructure (AWS, Azure, or GCP)
- Deep experience with infrastructure-as-code and configuration management (e.g., Terraform, CloudFormation, Ansible)
- Expertise in containerization and orchestration (Docker, Kubernetes) and operating cloud-native services
- Strong observability practice with tools such as Prometheus/Grafana, Datadog, New Relic, OpenTelemetry, ELK/Splunk, or equivalent
- Demonstrated incident management leadership, root cause analysis, and continuous improvement mindset
- Experience designing and operating CI/CD pipelines and release management practices (e.g., GitHub Actions, Jenkins, GitLab CI, ArgoCD)
- Ability to work cross-functionally with Engineering, Product, Support, and Security; clear written and verbal communication
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
- Relevant certifications are a plus (e.g., AWS/Azure/GCP, Kubernetes CKA/CKAD, ITIL, or security-focused certifications)
Benefits
Comp & perks- Competitive Compensation
- Comprehensive Benefits Including health, dental, and vision insurance
- 401(k) plan with company match
- Generous paid time off
- Flexible Work Environment
- Inclusive & Collaborative Culture
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsProduction EngineeringLinux systems engineeringNetworking fundamentalsAutomation programming (Python, Go, Java)Infrastructure-as-codeContainerizationOrchestration (Docker, Kubernetes)CI/CD pipelines
Soft Skills
Incident management leadershipRoot cause analysisContinuous improvement mindsetCross-functional collaborationClear communication
Certifications
AWS certificationAzure certificationGCP certificationKubernetes CKAKubernetes CKADITIL certificationSecurity-focused certifications