Tech Stack
AirflowApacheAWSAzureCloudDockerKubernetesPythonPyTorchTensorflow
About the role
- Own the complete lifecycle of machine learning models for autonomous vehicle orchestration
- Build infrastructure, pipelines, and monitoring systems to deploy, validate, and continuously improve AI agents and reasoning systems
- Design and implement automated training pipelines for LLMs, agent frameworks, and reasoning models
- Build model versioning, experiment tracking, and artifact management systems
- Develop automated model validation and testing frameworks, including simulation-based evaluation of agent behaviors
- Implement A/B testing infrastructure for comparing AI reasoning strategies
- Create model monitoring and observability systems to track performance, drift, and reliability
- Optimize model deployment for edge computing (quantization, pruning, inference acceleration)
- Build automated retraining pipelines using operational feedback from field deployments
- Implement model governance and compliance frameworks, including audit trails and safety validation
- Design feature stores and data pipelines preparing operational sensor and mission data
- Support rollback and canary deployment strategies for safe model updates
- Ensure secure model deployment practices, including model encryption, access controls, and adversarial robustness validation
Requirements
- 4+ years of experience in MLOps, ML platform engineering, or production machine learning systems
- Strong proficiency in Python
- Experience with ML frameworks (PyTorch, TensorFlow, Hugging Face, MLflow, or similar)
- Experience with container orchestration for ML workloads (Docker, Kubernetes)
- Knowledge of model serving frameworks (TorchServe, TensorFlow Serving, Triton, or cloud ML endpoints)
- Understanding of ML experiment tracking, model versioning, and artifact management systems
- Experience with cloud ML platforms (AWS SageMaker, Azure ML, or Google AI Platform)
- Proficiency with data pipeline tools (Apache Airflow, Prefect, or similar)
- Knowledge of model monitoring, performance tracking, and automated alerting systems
- Understanding of CI/CD practices specifically applied to ML model deployment
- U.S. Citizenship with ability to obtain a security clearance
- Hybrid work environment
- Competitive pay
- Flexible time off
- Generous PTO policy
- Federal holidays
- Generous health, dental, and vision benefits
- Free OneMedical membership
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
MLOpsmachine learningPythonPyTorchTensorFlowHugging FaceMLflowDockerKubernetesCI/CD