FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Systems Reliability Engineer
AstreyaSRE owning enterprise reliability standards and leading Dynatrace implementation in an innovative tech environment. Collaborating with teams for observability maturity and operational excellence.
Tech Stack
Tools & technologiesAnsibleChefCloudGrafanaLinuxPuppetTerraform
About the role
Key responsibilities & impact- Dynatrace Platform Ownership
- Configure and manage Dynatrace agents, APM, RUM/Synthetics, Log-in-Context, and Zenoss alerting integrations.
- Build dashboards aligned to the four Golden Signals for both engineering and executive audiences.
- Own SLO-based alerting targeting MTTD under 5 minutes.
- SRE Embedded Engagement
- Conduct 2-3 week embedded engagements with tower teams to assess observability maturity, identify toil, and stand-up monitoring frameworks.
- Define and track SLIs/SLOs with program managers and team leads.
- Deliver runbooks and alert frameworks for Tier 1/2 incident execution.
- Champion blameless postmortems and RCA feedback loops.
- Automation & Infrastructure
- Implement and extend Ansible Automation Platform for infrastructure provisioning, configuration management, and event-driven workflows across Linux (600+ servers) and Windows (3,000-7,000 servers) environments.
- Contribute to automated remediation workflows targeting zero human intervention for known issues.
- Standards & CoE Contribution
- Establish and maintain enterprise-wide SRE baseline standards across all towers.
- Serve as a Domain Champion bridging the CoE to individual tower SRE teams.
- Contribute to OKR tracking across observability, MTTD/MTTI reduction, incident resolution, automation, and SRE readiness.
Requirements
What you’ll need- 5+ years of experience in Site Reliability Engineering, IT operations, or related fields.
- Bachelor's degree in Computer Science, Engineering, or equivalent (2 additional years in lieu of degree).
- Dynatrace — required, 1+ year hands-on minimum (more preferred) . Must be able to configure agents, APM instrumentation, dashboards, SLO alerting, and log integrations in a production environment with no ramp time needed. This is the single most critical qualifier — depth matters more than breadth across other tools.
- Grafana and AppDynamics experience — helpful and valued, not blocking: Grafana, AppDynamics, Sumo Logic, New Relic, or Thousand Eyes.
- Ansible Automation Platform — strong plus; Ansible Automation Platform . Any automation or configuration management tool (Terraform, Chef, Puppet) will be considered. Demonstrated automation mindset is what matters.
- Demonstrated ability to define SLIs/SLOs in collaboration with product and engineering teams, not just consume them.
- Demonstrated ability to present and lead in front of stakeholders — must be able to walk into a room, command credibility, explain SRE principles clearly to both technical and non-technical audiences, and guide teams through implementation. This is a hard requirement, not a soft skill.
- Experience in enterprise environments with a mix of on-prem, cloud, and homegrown applications — not just single-product SRE.
- Must be authorized to work in the U.S. (W2 only; no sponsorship). West Coast / PST hours required.
- Must be located near an Alaska Airlines hub city. Remote candidates expected on-site approximately once per month.
Benefits
Comp & perks- 🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account Astreya Website LinkedIn All Job Openings 1001 - 5000 employees Founded 2001 🔒 Cybersecurity 🏢 Enterprise ☁️ SaaS Cybersecurity
- Enterprise
- SaaS Astreya is a leading global provider of IT Managed Services and Technology Solutions, known for its innovative approach to digital engineering and IT logistics. The company focuses on empowering businesses to excel in today's dynamic digital landscape by maximizing productivity and fostering innovation. Astreya offers a range of services including Data Center & Network Management, Digital Workplace Services, Next-Gen Digital Engineering, and Cybersecurity Services. With a commitment to excellence and a focus on operational frameworks, Astreya aims to transform technology into a valuable strategic asset for organizations worldwide. Senior Systems Reliability Engineer 🔥 1 minute ago 🏢🏡 Remote – Hybrid 💵 $31 - $50 / hour ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Ansible Chef Cloud Grafana Linux Puppet Terraform Apply Now Find Hiring Managers Customize resume + cover letter Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
- Dynatrace Platform Ownership
- Configure and manage Dynatrace agents, APM, RUM/Synthetics, Log-in-Context, and Zenoss alerting integrations.
- Build dashboards aligned to the four Golden Signals for both engineering and executive audiences.
- Own SLO-based alerting targeting MTTD under 5 minutes.
- SRE Embedded Engagement
- Conduct 2-3 week embedded engagements with tower teams to assess observability maturity, identify toil, and stand-up monitoring frameworks.
- Define and track SLIs/SLOs with program managers and team leads.
- Deliver runbooks and alert frameworks for Tier 1/2 incident execution.
- Champion blameless postmortems and RCA feedback loops.
- Automation & Infrastructure
- Implement and extend Ansible Automation Platform for infrastructure provisioning, configuration management, and event-driven workflows across Linux (600+ servers) and Windows (3,000-7,000 servers) environments.
- Contribute to automated remediation workflows targeting zero human intervention for known issues.
- Standards & CoE Contribution
- Establish and maintain enterprise-wide SRE baseline standards across all towers.
- Serve as a Domain Champion bridging the CoE to individual tower SRE teams.
- Contribute to OKR tracking across observability, MTTD/MTTI reduction, incident resolution, automation, and SRE readiness. 🎯 Requirements
- 5+ years of experience in Site Reliability Engineering, IT operations, or related fields.
- Bachelor's degree in Computer Science, Engineering, or equivalent (2 additional years in lieu of degree).
- Dynatrace — required, 1+ year hands-on minimum (more preferred) . Must be able to configure agents, APM instrumentation, dashboards, SLO alerting, and log integrations in a production environment with no ramp time needed. This is the single most critical qualifier — depth matters more than breadth across other tools.
- Grafana and AppDynamics experience — helpful and valued, not blocking: Grafana, AppDynamics, Sumo Logic, New Relic, or Thousand Eyes.
- Ansible Automation Platform — strong plus; Ansible Automation Platform . Any automation or configuration management tool (Terraform, Chef, Puppet) will be considered. Demonstrated automation mindset is what matters.
- Demonstrated ability to define SLIs/SLOs in collaboration with product and engineering teams, not just consume them.
- Demonstrated ability to present and lead in front of stakeholders — must be able to walk into a room, command credibility, explain SRE principles clearly to both technical and non-technical audiences, and guide teams through implementation. This is a hard requirement, not a soft skill.
- Experience in enterprise environments with a mix of on-prem, cloud, and homegrown applications — not just single-product SRE.
- Must be authorized to work in the U.S. (W2 only; no sponsorship). West Coast / PST hours required.
- Must be located near an Alaska Airlines hub city. Remote candidates expected on-site approximately once per month. Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDynatraceAPM instrumentationSLO alertingAnsible Automation PlatformSLIsSLOsautomation mindsetconfiguration managementmonitoring frameworks
Soft Skills
stakeholder presentationleadershipcredibilitycommunicationguidancecollaborationblameless postmortemsRCA feedback loopsincident resolutionteam engagement