Salary
💰 $184,000 - $356,500 per year
Tech Stack
Distributed SystemsDockerKubernetesLinuxPythonRust
About the role
- Propose and implement engineering solutions to ensure delivery of functional, reliable, secure, and performance-optimal GPU clusters.
- Research in traditional AIOps and the emerging Agentic AI.
- Participate in on-call support for systems, platforms built and owned by the team.
- Work with coworkers across the AI Platform organization to understand the pain points of validating, monitoring and operating GPU clusters at scale.
Requirements
- BS/MS in Computer Science, Engineering, or equivalent experience.
- 8+ years in software/platform engineering, including 3+ years in ML infrastructure or distributed systems.
- Experience in software development lifecycle on Linux-based platforms.
- Strong coding skills in languages such as Python, C++ or Rust.
- Experience with Docker, Kubernetes, GitLab CI, automated deployments.
- Experience with AIOps or Agentic AI and apply it successfully in production environment.
- equity
- health insurance
- paid time off
- professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
software development lifecycleLinuxPythonC++RustDockerKubernetesGitLab CIAIOpsAgentic AI
Certifications
BS in Computer ScienceMS in Computer ScienceEngineering degree