FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

SRE, Site Reliability Engineering
Sofka TechnologiesSRE role focusing on technology resilience and observability within high-complexity environments. Join us to impact technology availability and user experience remotely.
Tech Stack
Tools & technologiesAnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesOpenShiftPrometheusPythonTerraform
About the role
Key responsibilities & impact- Adapt observability requirements to each technical solution to ensure coverage, visibility, and operational efficiency.
- Configure and maintain dashboards, metrics, alerts, and critical business controls.
- Validate solution resilience through chaos testing and scalability assessments under load.
- Implement resilient design patterns such as circuit breakers, fallbacks, and retries in distributed architectures.
- Identify and automate manual processes using infrastructure-as-code tools to reduce MTTR.
- Lead the implementation of self-remediation workflows and promote continuous improvement practices in operations.
- Collaborate with development and architecture teams to ensure technical quality across critical user journeys.
Requirements
What you’ll need- Minimum 3 years of experience leading technology resilience and observability in high-complexity environments.
- Proven experience automating operational tasks and managing incidents under SRE/DevOps methodologies.
- Observability: Dynatrace (primary hands-on), Grafana, Prometheus, OpenTelemetry, and the ELK Stack.
- Automation and IaC: Ansible, Terraform, Terragrunt, and Monaco (Monitoring as Code).
- Containerization: Kubernetes (AKS, EKS), OpenShift (advanced level), and Docker.
- Programming languages: Python (advanced), Bash, YAML, and PowerShell.
- Cloud & Infrastructure: Azure, AWS, or GCP (Networking, Security, and Compute).
- Reliability management: definition of SLIs, SLOs, SLAs, and Error Budget management.
- CI/CD: Git, Jenkins, Azure DevOps, and GitHub Actions.
- Resilience engineering: Chaos Engineering, circuit breaker patterns, and Canary/Blue-Green deployments.
Benefits
Comp & perks- Technical and personal challenges that will keep you continuously growing.
- A connected team focused on your physical and mental wellbeing.
- A fresh, collaborative continuous-improvement culture with learning opportunities and people ready to support you.
- KaizenHub, a program designed to boost your talents, offering feedback, mentoring, and coaching through Sofka U.
- Programs such as Happy Kaizen and WeSofka that support your physical and emotional wellbeing.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonBashYAMLPowerShellChaos EngineeringCircuit Breaker PatternsSLIs, SLOs, SLAs ManagementError Budget ManagementSelf-Remediation WorkflowsAutomation of Operational Tasks
Soft Skills
CollaborationContinuous ImprovementLeadership