
MLOps Engineer
Akvelon, Inc.
contract
Posted on:
Location Type: Remote
Location: Serbia
Visit company websiteExplore more
Tech Stack
About the role
- Build and operate the AI platform infrastructure enabling developers to ship LLM-based services faster.
- Implement and maintain Kubernetes-based runtime environments (incl. AKS) for AI workloads.
- Manage infrastructure as code with Terraform (modules, environments, CI/CD automation).
- Support LLM workflows: RAG, agents, prompt experimentation, evaluations, and deployment patterns.
- Integrate and operate tooling such as Azure AI Foundry, LiteLLM, Langfuse, MLflow.
- Orchestrate pipelines using Kubeflow Pipelines and/or Argo Workflows (build, deploy, evaluate).
- Improve platform reliability and observability (monitoring, logging, tracing, cost/perf signals).
- Collaborate closely with developers to streamline DX (APIs, templates, docs, golden paths, automation).
Requirements
- Strong hands-on experience with Kubernetes in production (preferably AKS).
- Solid Terraform expertise (IaC best practices, multi-env setups).
- Practical experience supporting ML/LLM workloads in a platform or DevOps/MLOps context.
- Proficiency in Python for automation, scripting, and supporting APIs/evaluation tooling.
- Understanding of CI/CD, release processes, and production-grade operations.
- Ability to work under tight timelines and deliver pragmatically.
- Nice to Have: Experience building internal developer platforms or “paved roads” for engineering teams.
- Familiarity with LLM evaluation frameworks, prompt testing workflows, and LLM observability.
- Exposure to RAG architectures, vector databases, and agentic patterns.
- Experience with Kubeflow, Argo, and ML lifecycle tooling.
Benefits
- None specified 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesTerraformPythonCI/CDML/LLM workloadsKubeflowArgo WorkflowsInfrastructure as CodeMonitoringLogging
Soft Skills
collaborationtime managementpragmatism