DevOps Engineer

• Ensure the reliability, scalability, and performance of HPC and cloud systems
• Build and maintain automation, observability, and monitoring frameworks for compute clusters
• Collaborate with ML, data, and infrastructure teams to deliver high-availability systems
• Develop and enhance CI/CD pipelines, deployment workflows, and on-call processes
• Participate in architecture design and long-term infrastructure strategy discussions
• Participate in a 24/7 on-call rotation, with at least one full on-call week per month

Senior – Principal Site Reliability Engineer

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills

Soft skills