
ML/AI Ops Engineer
Veeam Software
full-time
Posted on:
Location Type: Remote
Location: Costa Rica
Visit company websiteExplore more
About the role
- Own the end-to-end operationalization of ML and AI solutions—from development to scalable, reliable production systems that integrate seamlessly with other digital tools.
- Design, automate, and maintain CI/CD pipelines for model training, testing, deployment, and retraining (Azure DevOps, Databricks).
- Build, optimize, and version model lifecycle workflows, ensuring reproducibility, lineage, and governance across the ML/AI platform.
- Monitor production models for performance, drift, reliability, and resource usage; implement automated retraining workflows.
- Optimize compute, storage, and orchestration across the Databricks platform to ensure efficient, cost-effective operations.
- Collaborate closely with ML/AI Scientists, Data Engineers, and DWH team to transform research-grade models into production-ready services.
- Contribute to advancing our ML/AI platform, tooling, automation standards, and best practices.
Requirements
- Solid experience in operationalizing ML/AI models, including deployment, automation, monitoring, and lifecycle management.
- Strong programming skills in Python, PySpark, and SQL with clean, efficient, production-ready code.
- Experienced in feature engineering with a practical understanding of data engineering fundamentals - designing, validating, and optimizing feature pipelines, and ensuring feature consistency
- Experience in building Vector embeddings & RAG systems.
- Familiarity in ML and LLM models development and libraries used.
- Experience with MLflow (or similar tools) for model tracking, registry management, and lifecycle operations.
- Familiarity with CI/CD pipelines (Azure DevOps preferred)
- Strong grasp of data versioning, model versioning, reproducibility, and data lineage within governed ML/AI environments.
- Experience designing, consuming, or integrating REST APIs to expose ML/AI models as services and support real-time or near-real-time inference.
- Experience monitoring production models, identifying drift or performance issues, and implementing corrective workflows.
- A collaborative, systems-thinking mindset, working closely with ML/AI Scientists, Data Engineers, and Data Warehouse team.
Benefits
- Two weeks of paid vacation, 12 statutory holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Paid parental leave: 8 days for fathers, 122 days for birthing parents, 92 days for adoptive parents
- Medical, dental, and vision coverage fully funded through INS Premium for employees and dependents
- Mental health support, therapy sessions, and virtual care via our Employee Assistance Program
- Retirement and social security contributions through Costa Rica’s statutory programs
- Life insurance equal to 24x monthly salary, plus disability and funeral coverage
- Daily cafeteria subsidy
- Fertility, adoption, and surrogacy support, plus 24 paid volunteer hours through Veeam Cares
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
operationalizing ML modelsoperationalizing AI modelsdeploymentautomationmonitoringlifecycle managementPythonPySparkSQLfeature engineering
Soft Skills
collaborative mindsetsystems thinking