Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Calix

Staff AI Ops Engineer

Calix

. Design, implement, and maintain scalable infrastructure for ML and GenAI applications .

Posted 4/21/2026full-timeRemote • 🇺🇸 United StatesLead💰 $136,000 - $265,700 per yearWebsite

Tech Stack

Tools & technologies
AirflowCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonPyTorchTerraform

About the role

Key responsibilities & impact
  • Design, implement, and maintain scalable infrastructure for ML and GenAI applications
  • Deploy, operate, and troubleshoot production ML/GenAI pipelines/services
  • Build and optimize CI/CD pipelines for ML model deployment and serving
  • Scale compute resources across CPU/GPU architectures to meet performance requirements
  • Implement container orchestration with Kubernetes
  • Architect and optimize cloud resources on GCP for ML training and inference
  • Setup and maintain runtime frameworks and job management systems (Airflow, KubeFlow, MLflow, etc.)
  • Establish monitoring, logging and alerting for systems observability
  • Optimize system performance and resource utilization for cost efficiency
  • Develop and enforce AIOps best practices across the organization

Requirements

What you’ll need
  • Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
  • 8+ years of overall software engineering experience
  • 3+ years of focused experience in DevOps/AIOps or similar ML infrastructure roles
  • Proficient in IaC, using Terraform
  • Strong experience with containerization and orchestration using Docker and Kubernetes
  • Demonstrated expertise in cloud infrastructure management on GCP
  • Proficiency with workflow management such as Airflow & Kubeflow
  • Strong CI/CD expertise with experience implementing automated testing and deployment pipelines
  • Experience with scaling distributed compute architectures utilizing various accelerators (CPU/GPU)
  • Solid understanding of system performance optimization techniques
  • Experience implementing comprehensive observability solutions for complex systems
  • Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack)
  • Strong proficiency in Python
  • Familiarity with ML frameworks such as PyTorch and ML platforms like Vertex AI
  • Excellent problem-solving skills and ability to work independently
  • Strong communication skills and ability to work effectively in cross-functional teams

Benefits

Comp & perks
  • Health insurance
  • 401(k) matching
  • Flexible work arrangements
  • Professional development
  • Possible bonuses

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learninggenerative AICI/CDcontainer orchestrationKubernetescloud infrastructure managementGCPTerraformPythonsystem performance optimization
Soft Skills
problem-solvingcommunicationindependent workcross-functional teamwork