Architect, design, and lead the implementation of scalable, secure, and resilient infrastructure for AI/ML workloads
Build and maintain the continuous integration and continuous deployment (CI/CD) pipelines specifically for AI models, managing everything from data ingestion to model serving and monitoring
Act as a subject matter expert and mentor for junior engineers, driving the adoption of best practices for AI/ML platform development, code management, and deployment
Work closely with AI researchers, and software engineers to optimize performance and streamline the end-to-end development and experimentation process.
Ensure the reliability, security, and performance of AI systems.
Stay ahead of industry trends and bring innovative AI solutions to the table.
Requirements
8+ years of experience in DevOps, ML Infrastructure, or Platform Engineering, with several years in a senior or leadership capacity.
6+ years of deep, expert-level knowledge of at least one major cloud provider (e.g., AWS, Azure, or GCP), including their AI/ML and containerization services.
6+ years of hands-on experience with ML orchestration and platform tools like Kubeflow, MLflow, or Amazon SageMaker.
6+ years of deep expertise with Docker and Kubernetes for managing and scaling AI/ML workloads.
6-8+ years strong coding skills in languages such as Python, Go, or Bash for automation.
Proficiency with tools like Terraform or Ansible to manage and provision infrastructure.
Experience building robust and automated CI/CD pipelines using tools like GitHub Actions, GitLab CI/CD, or Jenkins.
Excellent collaboration and communication skills to work effectively with diverse technical and business teams.
Benefits
Medical, Dental, and Vision Insurance Options
Life and Disability Insurance
Paid Time-Off
Parental Benefits
Compassionate Care Leave
401k with Company Match
Employee Stock Purchase Plan
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.