FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
AutodeskSenior Site Reliability Engineer developing reliable and scalable cloud services for Autodesk GovCloud. Partnering with engineering teams and improving incident response in secure environments.
Posted 6/19/2026full-timeRemote • Idaho, Texas • 🇺🇸 United StatesSenior💰 $117,000 - $209,330 per yearWebsite
Tech Stack
Tools & technologiesAWSAzureCloudGoJavaPython
About the role
Key responsibilities & impact- Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services
- Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments
- Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind
- Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews
- Build automation to improve deployment safety, operational efficiency, incident response, and service recovery
- Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems
- Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services
- Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement
- Develop and maintain operational documentation, runbooks, and recovery procedures
- Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness
- Continuously identify and eliminate operational toil through software engineering, automation, and process improvement
- Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable
- Participate in a 24x7 on-call rotation for production services
Requirements
What you’ll need- B.S. or higher in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience
- 7+ years of experience in Site Reliability Engineering, Software Engineering, Platform Engineering, Cloud Infrastructure, or Production Operations
- Experience operating and supporting customer-facing production services in large-scale cloud environments
- Strong understanding of reliability engineering principles, including SLOs/SLIs, observability, incident management, capacity planning, production readiness, and automation
- Experience with AWS, Azure, or other public cloud platforms
- Experience developing automation using languages such as Python, Go, Java, PowerShell, Bash, or similar
- Experience with Infrastructure as Code, CI/CD pipelines, deployment automation, and modern cloud operations practices
- Understanding of security, compliance, and operational risk management in production environments
- Strong written and verbal communication skills.
Benefits
Comp & perks- Health and financial benefits
- Time away and everyday wellness
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringSoftware EngineeringPlatform EngineeringCloud InfrastructureProduction OperationsSLOsSLIsAutomationInfrastructure as CodeCI/CD
Soft Skills
communicationincident managementtroubleshootingcontinuous improvementoperational documentation
Certifications
B.S. in Computer ScienceB.S. in EngineeringFedRAMP compliance