FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal Cloud and Production Operations Engineer
qode.worldSenior Engineer responsible for architecting, optimizing hybrid, cloud-native environments for critical services. Collaborate across teams to enhance production reliability and operational engineering.
Tech Stack
Tools & technologiesAnsibleAWSAzureChefCloudDockerGrafanaJenkinsKubernetesPrometheusPuppetPythonTerraform
About the role
Key responsibilities & impact- Design, implement, and maintain cloud and hybrid infrastructure supporting production workloads, enterprise systems, and CI/CD pipelines
- Lead the adoption of infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools to enable repeatable, auditable, and secure deployments
- Architect scalable and fault-tolerant solutions across OCI, AWS, Azure, and on-prem data centers, ensuring high availability and cost efficiency
- Evaluate emerging cloud services and technologies for applicability to business needs and long-term scalability goals
- Serve as the technical lead for production operations, ensuring uptime, performance, and reliability of customer-facing and internal systems
- Develop and maintain observability frameworks leveraging metrics, logs, and traces to ensure proactive detection and rapid response
- Partner with engineering teams to implement SRE-inspired practices, including service level objectives (SLOs), error budgets, and post-incident reviews
- Drive root cause analysis, performance tuning, and continuous improvement of production services
- Collaborate with DevOps and application engineering teams to build and optimize automated deployment pipelines supporting frequent, low-risk releases
- Integrate security and compliance checks into CI/CD workflows to ensure production readiness and alignment with internal standards
- Design self-healing infrastructure and automated rollback mechanisms to reduce operational risk
- Ensure secure and reliable configuration management and environment orchestration using tools such as Ansible, Chef, or Puppet
- Establish and enforce operational best practices for monitoring, patching, and change management across production systems
- Lead production readiness reviews for new releases and large-scale changes
- Collaborate with the Security and Compliance teams to ensure systems adhere to policy, hardening standards, and regulatory requirements
- Participate in and occasionally lead on-call rotations for critical production systems, ensuring rapid triage and resolution
- Act as a technical mentor to cloud and infrastructure engineers, fostering a culture of knowledge sharing and engineering excellence
- Lead architectural reviews, design sessions, and capacity planning discussions
- Serve as a trusted advisor to management on cloud modernization, resilience engineering, and cost optimization strategies
Requirements
What you’ll need- Bachelor’s degree in Computer Science, Information Systems, or related field; Master’s preferred
- 10+ years of experience in cloud and infrastructure engineering, including 3+ years in a senior or principal role
- Expertise with OCI (preferred), AWS and/or Azure cloud services, including networking, compute, storage, and identity management
- Proven experience managing production-scale environments supporting mission-critical applications and services
- Strong proficiency in:
- -Infrastructure-as-code (Terraform, CloudFormation)
- -CI/CD and DevOps toolchains (Jenkins, GitLab, ArgoCD)
- -Container orchestration (Kubernetes, Docker)
- -Monitoring and observability platforms (Prometheus, Grafana, Datadog, ELK)
- -Scripting and automation (Python, Bash, PowerShell)
- Solid understanding of security, compliance, and networking principles in hybrid environments
- Exceptional analytical, problem-solving, and incident management skills
- Demonstrated ability to lead complex, cross-functional initiatives from concept to execution
Benefits
Comp & perks- 🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account qode.world Website LinkedIn All Job Openings 11 - 50 employees 🤖 Artificial Intelligence 👥 HR Tech 🎯 Recruiter Artificial Intelligence
- HR Tech
- Recruitment qode. world is a company that leverages artificial intelligence to revolutionize the recruiting process. Their platform allows users to find candidates by sourcing data from billions of data points worldwide and provides data-driven insights. Users can connect with candidates directly through the platform, conduct customized AI-led interviews, and get comprehensive assessments. The service also integrates easily with LinkedIn, enhancing the talent pool and facilitating direct communication with candidates listed there. Qode. world offers additional recruiting services to assist in hiring for niche or senior roles. They are praised for their effectiveness in streamlining the hiring process and delivering quick results. Principal Cloud and Production Operations Engineer Job not on LinkedIn 🔥 13 minutes ago 🏢🏡 California – Hybrid ⏰ Full Time 🔴 Lead 🏭 Production Engineer Ansible AWS Azure Chef Cloud Docker Grafana Jenkins Kubernetes Prometheus Puppet Python Terraform Apply Now Find Hiring Managers Customize resume + cover letter Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
- Design, implement, and maintain cloud and hybrid infrastructure supporting production workloads, enterprise systems, and CI/CD pipelines
- Lead the adoption of infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools to enable repeatable, auditable, and secure deployments
- Architect scalable and fault-tolerant solutions across OCI, AWS, Azure, and on-prem data centers, ensuring high availability and cost efficiency
- Evaluate emerging cloud services and technologies for applicability to business needs and long-term scalability goals
- Serve as the technical lead for production operations, ensuring uptime, performance, and reliability of customer-facing and internal systems
- Develop and maintain observability frameworks leveraging metrics, logs, and traces to ensure proactive detection and rapid response
- Partner with engineering teams to implement SRE-inspired practices, including service level objectives (SLOs), error budgets, and post-incident reviews
- Drive root cause analysis, performance tuning, and continuous improvement of production services
- Collaborate with DevOps and application engineering teams to build and optimize automated deployment pipelines supporting frequent, low-risk releases
- Integrate security and compliance checks into CI/CD workflows to ensure production readiness and alignment with internal standards
- Design self-healing infrastructure and automated rollback mechanisms to reduce operational risk
- Ensure secure and reliable configuration management and environment orchestration using tools such as Ansible, Chef, or Puppet
- Establish and enforce operational best practices for monitoring, patching, and change management across production systems
- Lead production readiness reviews for new releases and large-scale changes
- Collaborate with the Security and Compliance teams to ensure systems adhere to policy, hardening standards, and regulatory requirements
- Participate in and occasionally lead on-call rotations for critical production systems, ensuring rapid triage and resolution
- Act as a technical mentor to cloud and infrastructure engineers, fostering a culture of knowledge sharing and engineering excellence
- Lead architectural reviews, design sessions, and capacity planning discussions
- Serve as a trusted advisor to management on cloud modernization, resilience engineering, and cost optimization strategies 🎯 Requirements
- Bachelor’s degree in Computer Science, Information Systems, or related field; Master’s preferred
- 10+ years of experience in cloud and infrastructure engineering, including 3+ years in a senior or principal role
- Expertise with OCI (preferred), AWS and/or Azure cloud services, including networking, compute, storage, and identity management
- Proven experience managing production-scale environments supporting mission-critical applications and services
- Strong proficiency in:
- -Infrastructure-as-code (Terraform, CloudFormation)
- -CI/CD and DevOps toolchains (Jenkins, GitLab, ArgoCD)
- -Container orchestration (Kubernetes, Docker)
- -Monitoring and observability platforms (Prometheus, Grafana, Datadog, ELK)
- -Scripting and automation (Python, Bash, PowerShell)
- Solid understanding of security, compliance, and networking principles in hybrid environments
- Exceptional analytical, problem-solving, and incident management skills
- Demonstrated ability to lead complex, cross-functional initiatives from concept to execution Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud infrastructure engineeringinfrastructure-as-codeTerraformCloudFormationCI/CDDevOpsKubernetesDockerscriptingPython
Soft Skills
analytical skillsproblem-solvingincident managementleadershipcollaborationmentoringcommunicationcross-functional initiative managementknowledge sharingcapacity planning
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Information SystemsMaster’s degree (preferred)