FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer
AceHack 4.0Site Reliability Engineer at Orkes solving distributed systems challenges and managing cloud infrastructure. Engaging in incident management and improving system reliability through observability tools.
Posted 5/15/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $180,000 - $250,000 per yearWebsite
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform
About the role
Key responsibilities & impact- Own reliability, availability, and performance of production systems running in cloud environments
- Define and monitor SLIs/SLOs and help manage error budgets across the platform
- Lead incident response efforts including detection, triage, mitigation, and postmortems
- Improve observability through logging, monitoring, alerting, and dashboards
- Automate operational workflows and reduce manual toil wherever possible
- Partner closely with engineering teams to improve system resiliency and scalability
- Assist with capacity planning, infrastructure optimization, and performance tuning
- Build internal tooling, runbooks, and operational best practices
- Support Kubernetes-based infrastructure and distributed systems at scale
- Act as an escalation point for complex production and platform issues
Requirements
What you’ll need- 5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related infrastructure roles
- Strong experience with cloud platforms such as AWS, GCP, or Azure
- Hands-on experience with Kubernetes and containerized environments
- Strong understanding of distributed systems and microservices architecture
- Experience with observability tools such as Prometheus, Grafana, Datadog, ELK, or OpenTelemetry
- Proficiency with infrastructure automation and scripting (Terraform, Python, Bash, etc.)
- Experience managing CI/CD pipelines and deployment automation
- Strong troubleshooting and incident management skills
- Ability to work cross-functionally and communicate effectively during high-pressure situations
Benefits
Comp & perks- Comprehensive health coverage including medical, dental, and vision
- Flexible PTO
- Support for personal development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsPlatform Engineeringcloud platformsKubernetesdistributed systemsmicroservices architectureobservability toolsinfrastructure automationCI/CD pipelines
Soft Skills
troubleshootingincident managementcross-functional collaborationeffective communication