FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsDockerGrafanaKubernetesLinuxPrometheusPythonTerraform
About the role
Key responsibilities & impact- Proactively identify, evaluate, and implement preventative measures to reduce customer impact.
- Ensure all services are designed and operated with 24/7 availability, scalability, and resilience in mind.
- Monitor, troubleshoot, and provide visibility to improve site latency, performance, and uptime.
- Design, develop, and automate reliable cloud infrastructure and platform services.
- Apply Infrastructure-as-Code (IaC) principles to manage large-scale distributed systems.
- Write and maintain scripts, tools, and automation frameworks to support operational efficiency.
- Partner with engineering leadership to develop solutions enabling developer productivity and remove cross functional dependencies.
- Collaborate with Platform Engineering teams on project definitions, requirements, backlog grooming, and planning processes.
- Align operational goals with product and engineering roadmaps to ensure reliability requirements are met early in the lifecycle.
- Define non-functional requirements (NFRs) and influence standards for scalability, observability, and fault tolerance.
- Lead cross-functional troubleshooting of complex issues spanning applications, infrastructure, databases, and networks.
- Serve as a technical mentor to SRE I and II engineers, guiding them in best practices for reliability, automation, and incident management.
- Lead root cause analysis and postmortem reviews, driving continuous improvement initiatives.
- Support offshore and distributed teams, promoting effective collaboration and communication.
- Participate in design and architecture reviews, offering technical recommendations and documentation for key stakeholders
Requirements
What you’ll need- Bachelor’s degree in Computer Science or a STEM field
- Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
- Preferred: Certified Kubernetes Administrator (CKA) and/or AWS Certification
- Strong proficiency in Linux-based distributed environments (up to 70% hands-on work).
- Deep experience with cloud platforms (AWS or Azure) and Infrastructure-as-Code (Terraform).
- Excellent scripting skills (Python, Bash, Powershell); object-oriented programming experience is a plus.
- Demonstrated ability to develop and maintain internal tools and automation solutions.
- Excellent written and verbal communication skills in English.
- Strong project management and organizational abilities with a bias for action.
- Experience collaborating with offshore or globally distributed teams.
- Expertise in containerization and orchestration technologies (Docker, Kubernetes).
- Experience with Kubernetes scaling tooling (Karpenter, KEDA).
- Strong understanding of DevOps principles and modern CI/CD pipelines.
- Experience with observability stacks (Prometheus, Grafana, OpenTelemetry).
- Familiarity with self-healing systems, and site reliability best practices.
- Background in SaaS environments or large-scale distributed applications.
- Analytical thinker with a focus on root-cause problem solving.
- Self-starter with a strong ownership mentality and accountability.
- Mentor and collaborator who uplifts teams and promotes learning culture.
- Committed to operational excellence and continuous improvement.
Benefits
Comp & perks- Competitive pay
- Flexible work
- Inclusive, collaborative environment that supports your success
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Infrastructure-as-Codecloud infrastructurescriptingPythonBashPowershellcontainerizationorchestrationDevOps principlesobservability
Soft Skills
communicationproject managementorganizational abilitiescollaborationanalytical thinkingmentorshipownership mentalityaccountabilitycontinuous improvementproblem solving
Certifications
Certified Kubernetes AdministratorAWS Certification
