FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAnsibleAWSAzureDockerGrafanaJenkinsKubernetesLinuxTerraform
About the role
Key responsibilities & impact- Continuously monitor availability, latency, and performance of production systems to ensure a resilient and stable environment.
- Define, document, and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) to ensure services meet established reliability standards.
- Collaborate with development teams to balance delivery velocity with production system stability.
- Develop and implement automation solutions to eliminate repetitive manual tasks and optimize operational efficiency.
- Create and maintain runbooks, scripts, and custom tools that improve monitoring, fault detection, and automated incident response.
- Adopt Infrastructure as Code (IaC) practices for automated provisioning, ensuring consistent, scalable, and efficient environments.
- Respond quickly to production incidents, perform effective triage, and minimize user impact.
- Conduct incident post-mortems, identify root causes, and implement corrective actions to prevent recurrence.
- Create and refine incident response playbooks to ensure the team acts quickly and accurately during critical situations.
- Work with development teams to ensure systems are scalable and can handle traffic spikes without degrading performance.
- Perform regular load and performance testing to identify bottlenecks and implement improvements to optimize operational efficiency.
- Tune infrastructure and application configurations to maximize resource utilization and reduce operational costs.
- Adopt DevSecOps practices to integrate security across the development and operations lifecycle.
- Work with the security team to implement compliance policies, auditing, and protection for sensitive data.
- Proactively monitor and respond to security vulnerabilities and threats, ensuring system integrity and privacy.
- Keep system, process, and operational procedure documentation up to date, ensuring traceability and version control for auditing and continuous improvement.
- Produce technical guides and documentation for development and operations teams, promoting best practices and alignment across teams.
Requirements
What you’ll need- Clear communication skills and attention to detail;
- Punctuality and reliability with deadlines and schedules;
- Ability to produce technical documentation for internal teams and clients;
- Proven experience supporting cloud infrastructure (preferably AWS);
- Operating systems: Linux and Windows;
- Proficiency with Infrastructure as Code (Terraform, Ansible, and CloudFormation);
- Experience creating/managing containers (Kubernetes, Docker, autoscaling, Spot.io);
- Knowledge of infrastructure management;
- Monitoring tools knowledge (Zabbix, Grafana, New Relic, Elastic Stack);
- Knowledge of cloud connectivity (VPN, VPC, Security Groups);
- Experience writing CI/CD pipelines (Jenkins, AWS CodePipeline, Azure DevOps, GitHub Actions).
Benefits
Comp & perks- Flexible working hours
- Educational incentives (partnerships with educational institutions)
- Paid vacation
- TotalPass
- Birthday off
- Health insurance
- Dental insurance
- Maternity leave
- Paternity leave
- Reimbursement for AWS certifications
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Infrastructure as CodeTerraformAnsibleCloudFormationKubernetesDockerCI/CD pipelinesJenkinsAWS CodePipelineAzure DevOps
Soft Skills
clear communicationattention to detailpunctualityreliabilitytechnical documentation
