FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudDockerJavaJenkinsLinuxPythonTerraformUnix
About the role
Key responsibilities & impact- Design, implement, and maintain reliable infrastructure systems with a focus on security, scalability, reliability, and automation using tools like Terraform or CloudFormation
- Build and maintain scalable and resilient production systems with a focus on automation
- Develop and implement monitoring solutions to ensure system health, performance, and availability
- Lead incident response, perform root cause analysis, and implement preventative measures
- Track SLOs, and SLAs to measure and improve service reliability and error budgets to drive reliability improvements
- Design and implement CI/CD pipelines to enable rapid and reliable software delivery
- Partner with development teams to improve application performance, resilience, and scalability
- Contribute to capacity planning and performance optimization initiatives
- Participate in an on-call rotation to support production systems
- Develop and evolve security monitoring, alerting, and incident response
Requirements
What you’ll need- 2-4 years of experience in SRE, DevOps, or similar roles with Java or Python knowledge
- Expertise in incident management, disaster recovery, and building resilience engineering frameworks
- Strong programming skills in at least one language such as Java or Python
- Experience with Linux/Unix systems administration
- Hands-on experience with serverless (Lambda) and containerization technologies (Docker)
- Experience implementing and managing cloud infrastructure (AWS, Azure DevOps)
- Advanced understanding of networking concepts, load balancing, security best practices, and CDN technologies
- Experience with observability systems (like Dynatrace)
- Knowledge of database technologies and their performance characteristics
- Demonstrated experience handling incident response and post-mortem analysis
- Bachelor's degree in computer science or equivalent practical experience
- Deep knowledge of infrastructure-as-code tools (Terraform, CloudFormation)
- Knowledge of CI/CD pipeline design and implementation (Jenkins, GitLab CI, Azure DevOps)
- Experience building and maintaining comprehensive monitoring and alerting systems
- Experience managing high-traffic, mission-critical production environments
- Background in capacity planning and performance optimization
- Strong incident management skills, including crisis communication
Benefits
Comp & perks- Short Term Incentive
- Work Arrangement: Hybrid
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaPythonLinux/Unix administrationserverless technologiescontainerizationcloud infrastructureinfrastructure-as-codeCI/CD pipeline designmonitoring systemsdatabase technologies
Soft Skills
incident managementdisaster recoveryresilience engineeringcrisis communicationcapacity planningperformance optimizationroot cause analysispreventative measurescollaborationleadership
Certifications
Bachelor's degree in computer science
