
AI Operations Engineer
Newsela
contract
Posted on:
Location Type: Remote
Location: Anywhere in Latin America
Visit company websiteExplore more
Tech Stack
About the role
- Design and maintain CI/CD pipelines for ML model training, packaging, and deployment across our microservices.
- Manage containerized services on AWS ECS, optimizing for cost, latency, and availability.
- Automate infrastructure provisioning and service configuration with Terraform.
- Work to maintain and scale services that make use of third party LLM providers.
- Build and improve data pipelines that feed models from BigQuery, S3, and DynamoDB into training and inference workflows.
- Instrument services with observability tooling (Datadog, OpenTelemetry, Langfuse) and establish SLOs for model-serving endpoints.
- Collaborate with ML engineers to productionize new models using BentoML, FastAPI, and container-based serving.
Requirements
- 2-3 years in ML Ops supporting ML/AI features, systems and workflows with 3-4 years prior experience in DevOps, CloudOps or SRE.
- Strong proficiency in Python.
- Hands-on experience with Docker containerization and container orchestration.
- Solid understanding of CI/CD for ML workflows in an enterprise production environment.
- Experience with Infrastructure as Code, preferably Terraform.
- Familiarity with cloud platforms — specifically AWS (ECS, ECR, S3, DynamoDB, CloudWatch) and GCP (BigQuery, Vertex AI).
- Experience with LLM integration and observability (OpenAI API, Google GenAI, Langfuse tracing).
- Experience building and maintaining data pipelines for ML training and feature engineering
- Familiarity with ML modeling workflows — training, evaluation, experiment tracking (e.g. MLFlow, Weights & Biases), and model versioning
- Experience monitoring and flagging model drift over time.
- Exposure to NLP/NLU models and frameworks such as Hugging Face Transformers, spaCy, or sentence-transformers
- Knowledge of vector databases (LanceDB, FAISS) and embedding-based retrieval systems
- Experience with scaling and maintaining deep learning frameworks (TensorFlow, PyTorch) in production settings
- Familiarity with classical ML libraries (scikit-learn, XGBoost, LightGBM) and model explainability tools (SHAP)
- Working knowledge of ML serving frameworks such as BentoML or similar.
- Comfort working with FastAPI or similar async Python web frameworks.
Benefits
- Competitive salary
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonDockerCI/CDTerraformAWSGCPMLFlowTensorFlowPyTorchFastAPI