Sonia

ML Platform Engineer

Sonia

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇩🇪 Germany

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSAzureCloudFluxGoogle Cloud PlatformGrafanaKubernetesPostgresPythonRabbitMQRedis

About the role

  • Support and enhance our Kubernetes-based infrastructure in cloud environments, running both ML/LLM workloads and general applications
  • Deploy and optimize LLM inference systems
  • Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle
  • Manage GPU scheduling and autoscaling with Kubernetes-native tooling
  • Ensure observability and alerting across the platform
  • Operate and troubleshoot supporting infrastructure
  • Contribute to platform reliability, security, and performance through automation and best practices

Requirements

  • 5+ years of experience in MLOps or SRE
  • Strong hands-on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting
  • Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm-d)
  • Experience with MLOps supporting tools such as MLflow or Argo Workflows
  • Understanding of GPU resource orchestration in Kubernetes environments
  • Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana
  • Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ)
  • Solid scripting skills in Python
  • Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure)
Benefits
  • Full ownership of a mission-critical platform
  • A team that values curiosity, learning, and experimentation
  • Remote-first setup with the option to work in our Berlin office
  • Competitive salary depending on experience
  • Work on AI infrastructure that directly impacts healthcare innovation

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
KubernetesMLOpsSREGitOpsKustomizeHelmLLM inferencePythonobservabilityGPU orchestration
Emma – The Sleep Company

Senior Platform Engineer

Emma – The Sleep Company
Seniorfull-time🇩🇪 Germany
Posted: 7 days agoSource: jobs.lever.co
AWSCloudDistributed SystemsDockerGoGrafanaKubernetesPrometheusPythonTerraformTypeScript
1KOMMA5°

Platform Engineer

1KOMMA5°
Mid · Seniorfull-time🇩🇪 Germany
Posted: 13 days agoSource: 1komma5grad.jobs.personio.com
AWSCloudDockerGoogle Cloud PlatformKubernetesPythonTerraformTypeScript