FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAnsibleAWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformKubernetesLinuxPythonSplunkTerraformUnix
About the role
Key responsibilities & impact- Design, implement, and support fault-tolerant, highly available architectures across Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), including redundancy, load balancing, and automated failover strategies.
- Deploy, manage, and optimize cloud infrastructure using infrastructure-as-code (IaC) tools such as Terraform and Ansible.
- Implement and maintain monitoring, alerting, and logging solutions using tools such as Splunk, Azure Monitor, Dynatrace, and AWS CloudWatch or similar to detect and resolve issues proactively.
- Lead incident response activities, including real-time troubleshooting, root-cause analysis, post-incident reviews, and continuous improvement actions to increase uptime and resilience.
- Perform capacity planning and performance engineering by forecasting demand, tuning systems, and implementing autoscaling and performance best practices.
- Develop and maintain automation scripts and internal tools using Python, PowerShell, Bash, or similar languages to reduce manual intervention and operational toil.
- Collaborate with security teams to implement secure infrastructure practices including encryption, role-based access control, auditing, and vulnerability management.
- Work closely with engineering and DevOps teams to promote reliability best practices and contribute to a collaborative, blameless culture that improves consistency and quality of operations.
Requirements
What you’ll need- 5+ years of experience in cloud site reliability engineering, DevOps engineering, or systems engineering supporting large-scale, distributed systems in public cloud environments.
- Experience with at least one major public cloud platform such as AWS, Azure, or GCP, including virtual private clouds (VPCs), identity and access management (IAM), serverless components, and managed Kubernetes services.
- Experience with containers and orchestration technologies such as Docker and Kubernetes in production environments.
- Experience with infrastructure-as-code tools such as Terraform and Ansible to provision, configure, and manage cloud infrastructure.
- Experience implementing monitoring, logging, and observability solutions using tools such as Splunk, Azure Monitor, Dynatrace, AWS CloudWatch or similar.
- Experience administering Linux or Unix and Windows operating systems, including system administration and networking fundamentals.
- Experience with programming or scripting languages such as Python, PowerShell, Bash, or similar to automate system management and operational tasks.
- Bachelors degree or higher in Computer Science, Engineering, Information Technology or related field or equivalent combination of education, related experience and/or military experience.
Benefits
Comp & perks- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud site reliability engineeringDevOps engineeringsystems engineeringinfrastructure-as-codemonitoring solutionslogging solutionsautomation scriptingcapacity planningperformance engineeringcontainers and orchestration
Soft Skills
leadershipcollaborationtroubleshootingroot-cause analysiscontinuous improvementcommunicationblameless cultureproactive issue resolutionteamworkorganizational skills
Certifications
Bachelor's degree in Computer ScienceBachelor's degree in EngineeringBachelor's degree in Information Technology
