Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Sofka Technologies

SRE, Site Reliability Engineering

Sofka Technologies

SRE role focusing on technology resilience and observability within high-complexity environments. Join us to impact technology availability and user experience remotely.

Posted 7/2/2026contractRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesOpenShiftPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • Adapt observability requirements to each technical solution to ensure coverage, visibility, and operational efficiency.
  • Configure and maintain dashboards, metrics, alerts, and critical business controls.
  • Validate solution resilience through chaos testing and scalability assessments under load.
  • Implement resilient design patterns such as circuit breakers, fallbacks, and retries in distributed architectures.
  • Identify and automate manual processes using infrastructure-as-code tools to reduce MTTR.
  • Lead the implementation of self-remediation workflows and promote continuous improvement practices in operations.
  • Collaborate with development and architecture teams to ensure technical quality across critical user journeys.

Requirements

What you’ll need
  • Minimum 3 years of experience leading technology resilience and observability in high-complexity environments.
  • Proven experience automating operational tasks and managing incidents under SRE/DevOps methodologies.
  • Observability: Dynatrace (primary hands-on), Grafana, Prometheus, OpenTelemetry, and the ELK Stack.
  • Automation and IaC: Ansible, Terraform, Terragrunt, and Monaco (Monitoring as Code).
  • Containerization: Kubernetes (AKS, EKS), OpenShift (advanced level), and Docker.
  • Programming languages: Python (advanced), Bash, YAML, and PowerShell.
  • Cloud & Infrastructure: Azure, AWS, or GCP (Networking, Security, and Compute).
  • Reliability management: definition of SLIs, SLOs, SLAs, and Error Budget management.
  • CI/CD: Git, Jenkins, Azure DevOps, and GitHub Actions.
  • Resilience engineering: Chaos Engineering, circuit breaker patterns, and Canary/Blue-Green deployments.

Benefits

Comp & perks
  • Technical and personal challenges that will keep you continuously growing.
  • A connected team focused on your physical and mental wellbeing.
  • A fresh, collaborative continuous-improvement culture with learning opportunities and people ready to support you.
  • KaizenHub, a program designed to boost your talents, offering feedback, mentoring, and coaching through Sofka U.
  • Programs such as Happy Kaizen and WeSofka that support your physical and emotional wellbeing.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonBashYAMLPowerShellChaos EngineeringCircuit Breaker PatternsSLIs, SLOs, SLAs ManagementError Budget ManagementSelf-Remediation WorkflowsAutomation of Operational Tasks
Soft Skills
CollaborationContinuous ImprovementLeadership