Site Reliability Engineer

AceHack 4.0

Site Reliability Engineer at Orkes solving distributed systems challenges and managing cloud infrastructure. Engaging in incident management and improving system reliability through observability tools.

Posted 5/15/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $180,000 - $250,000 per yearWebsite

Tech Stack

Tools & technologies

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform

About the role

Key responsibilities & impact

Own reliability, availability, and performance of production systems running in cloud environments
Define and monitor SLIs/SLOs and help manage error budgets across the platform
Lead incident response efforts including detection, triage, mitigation, and postmortems
Improve observability through logging, monitoring, alerting, and dashboards
Automate operational workflows and reduce manual toil wherever possible
Partner closely with engineering teams to improve system resiliency and scalability
Assist with capacity planning, infrastructure optimization, and performance tuning
Build internal tooling, runbooks, and operational best practices
Support Kubernetes-based infrastructure and distributed systems at scale
Act as an escalation point for complex production and platform issues

Requirements

What you’ll need

5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related infrastructure roles
Strong experience with cloud platforms such as AWS, GCP, or Azure
Hands-on experience with Kubernetes and containerized environments
Strong understanding of distributed systems and microservices architecture
Experience with observability tools such as Prometheus, Grafana, Datadog, ELK, or OpenTelemetry
Proficiency with infrastructure automation and scripting (Terraform, Python, Bash, etc.)
Experience managing CI/CD pipelines and deployment automation
Strong troubleshooting and incident management skills
Ability to work cross-functionally and communicate effectively during high-pressure situations

Benefits

Comp & perks

Comprehensive health coverage including medical, dental, and vision
Flexible PTO
Support for personal development

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringDevOpsPlatform Engineeringcloud platformsKubernetesdistributed systemsmicroservices architectureobservability toolsinfrastructure automationCI/CD pipelines

Soft Skills

troubleshootingincident managementcross-functional collaborationeffective communication