Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
AceHack 4.0

Site Reliability Engineer

AceHack 4.0

Site Reliability Engineer at Orkes solving distributed systems challenges and managing cloud infrastructure. Engaging in incident management and improving system reliability through observability tools.

Posted 5/15/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $180,000 - $250,000 per yearWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • Own reliability, availability, and performance of production systems running in cloud environments
  • Define and monitor SLIs/SLOs and help manage error budgets across the platform
  • Lead incident response efforts including detection, triage, mitigation, and postmortems
  • Improve observability through logging, monitoring, alerting, and dashboards
  • Automate operational workflows and reduce manual toil wherever possible
  • Partner closely with engineering teams to improve system resiliency and scalability
  • Assist with capacity planning, infrastructure optimization, and performance tuning
  • Build internal tooling, runbooks, and operational best practices
  • Support Kubernetes-based infrastructure and distributed systems at scale
  • Act as an escalation point for complex production and platform issues

Requirements

What you’ll need
  • 5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related infrastructure roles
  • Strong experience with cloud platforms such as AWS, GCP, or Azure
  • Hands-on experience with Kubernetes and containerized environments
  • Strong understanding of distributed systems and microservices architecture
  • Experience with observability tools such as Prometheus, Grafana, Datadog, ELK, or OpenTelemetry
  • Proficiency with infrastructure automation and scripting (Terraform, Python, Bash, etc.)
  • Experience managing CI/CD pipelines and deployment automation
  • Strong troubleshooting and incident management skills
  • Ability to work cross-functionally and communicate effectively during high-pressure situations

Benefits

Comp & perks
  • Comprehensive health coverage including medical, dental, and vision
  • Flexible PTO
  • Support for personal development

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsPlatform Engineeringcloud platformsKubernetesdistributed systemsmicroservices architectureobservability toolsinfrastructure automationCI/CD pipelines
Soft Skills
troubleshootingincident managementcross-functional collaborationeffective communication