MLOps Engineer

Aerones

full-time

Posted on: 11/14/2025

Location Type: Hybrid

Location: Riga • 🇱🇻 Latvia

Visit company website

✨ AI Apply

Apply

Salary

💰 €2,500 - €5,500 per month

Job Level

Mid-LevelSenior

Tech Stack

AirflowAWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonPyTorchRayTerraform

About the role

Own the end-to-end ML pipeline for computer vision: data prep, training, evaluation, model packaging, artifact/version management, deployment, and monitoring (local GPU cluster + GCP).
Design and maintain containerized workflows for multi-GPU training and distributed workloads (e.g., PyTorch DDP, Ray, or similar).
Build and operate orchestration (e.g., Airflow/Argo/Kubeflow/Ray Jobs) for scheduled and on-demand pipelines across on-prem and cloud.
Implement and tune resource allocation strategies based on current and upcoming task queues (GPU/CPU/memory-aware scheduling; preemption/priority; autoscaling).
Introduce and integrate monitoring/telemetry for:
- job health and failure analysis (retry, backoff, alerts),
- data/feature drift and model performance (precision/recall, latency, throughput),
- infra metrics (GPU utilization, memory, I/O, cost).
Harden GCP environments (permissions, networks, registries, storage) and optimize for reliability, performance, and cost (spot/managed instance groups, autoscaling).
Establish model governance: experiment tracking, model registry, promotion gates, rollbacks, and audit trails.
Standardize CI/CD for ML (data/feature pipelines, model builds, tests, and canary/blue-green rollouts).
Collaborate with CV researchers/engineers to productionize new models and improve training throughput & inference SLAs.
Continuously improve documentation: update existing pipeline docs and produce concise runbooks, diagrams, and “how-to” guides.

Requirements

Hands-on MLOps experience building and running ML pipelines at scale (preferably computer vision) across on-prem GPUs and a public cloud (GCP preferred).
Strong with Docker and Docker Compose in local and cloud environments; solid understanding of image build optimization and artifact caching.
GitLab CI/CD expertise (modular templates, YAML optimization, build/test stages for ML, environment promotion).
Proficiency with Python and Bash for pipeline tooling, glue code, and automation; Terraform for infra-as-code (GCP resources, IAM, networking, storage).
Experience with orchestration: one or more of Airflow, Argo Workflows, Kubeflow, Ray, or Prefect.
Experience operating GPU workloads: NVIDIA driver/CUDA stack, container runtimes, device plugins (k8s), multi-GPU training, utilization tuning.
Observability & monitoring for ML and infra: Prometheus/Grafana, OpenTelemetry/Loki (or similar) for metrics, logs, traces; alerting and SLOs.
Experiment tracking / model registry with tools like MLflow or Weights & Biases (runs, params, artifacts, metrics, registry/promotion).
Data versioning & validation: DVC/lakeFS (or similar), Great Expectations/whylogs, schema checks, drift detection.
Cloud services: GCP (Compute Engine, GKE or Autopilot, Cloud Run, Artifact Registry, Cloud Storage, Pub/Sub). Equivalent AWS/Azure experience is acceptable.
Security & compliance for ML stacks: secrets management, SBOM/image scanning, least-privilege IAM, network policies, artifact signing.
Solid understanding of containerized deployment patterns (blue-green/canary), rollout strategies, and rollback safety.

Benefits

Salary from **2,500 EUR to 5,500 EUR per month** (before Taxes)
A Birthday Gift
**After Probationary Period **
**Health Insurance**
**Health Recovery Days **(which can be taken as you need)
Paid **Study Leave**
Funding for the purchase of **Vision Glasses **after one (1) year of service

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

MLOpsML pipelinescomputer visionDockerGitLab CI/CDPythonBashTerraformorchestrationdata versioning

Soft skills

collaborationdocumentationproblem-solvingcommunicationorganization