Mitratech

Senior Infrastructure Engineer – AI/ML

Mitratech

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇩🇪 Germany

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSCloudDockerEC2GrafanaKubernetesLinuxPrometheusTerraform

About the role

  • Design, deploy, and maintain scalable and secure infrastructure supporting AI and ML workloads.
  • Build and maintain AWS cloud environments for compute (EC2, ECS/EKS, Lambda), storage (S3, EFS, FSx), and networking (VPC, Transit Gateway, PrivateLink, Route 53, load balancers).
  • Implement security best practices using IAM, KMS, Secrets Manager, GuardDuty, and Security Hub.
  • Support and optimize AI/ML workloads across AWS services (SageMaker, Bedrock, Batch, Step Functions).
  • Develop and maintain Infrastructure as Code (IaC) using Terraform, AWS CDK, and CloudFormation.
  • Manage containerized workloads and orchestration platforms (Docker, EKS, Fargate), including GPU scheduling and scaling.
  • Set up and maintain monitoring and observability frameworks using CloudWatch and OpenTelemetry.
  • Build and manage CI/CD pipelines (CircleCI, GitHub Actions, GitLab CI) for infrastructure automation and ML/Gen AI deployments.
  • Collaborate with ML and Generative AI teams to scale models, optimize performance, and design efficient prompt or inference pipelines.
  • Develop runbooks and SOPs for AI service deployment, troubleshooting, and performance optimization.
  • Ensure security, compliance, and data protection across AI datasets and environments.

Requirements

  • Strong proficiency in Linux administration and systems-level troubleshooting.
  • Deep expertise in AWS cloud services, with experience in compute, storage, networking, and security domains.
  • Proficiency in container orchestration (Kubernetes/EKS) and infrastructure automation tools.
  • Hands-on experience with IaC tools such as Terraform, AWS CDK, or CloudFormation.
  • Familiarity with monitoring, logging, and observability stacks (Prometheus, Grafana, OpenTelemetry).
  • Experience implementing CI/CD pipelines for automated deployment and testing.
  • Understanding of AI/ML concepts, including model deployment, inference scaling, and LLM performance tuning.
  • Working knowledge of security best practices, IAM roles, encryption, and compliance controls.
  • Excellent collaboration and communication skills to partner with ML engineers, data scientists, and product teams.
Benefits
  • Equal-opportunity employer that values diversity at all levels
  • Professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSLinux administrationInfrastructure as CodeTerraformAWS CDKCloudFormationcontainer orchestrationKubernetesCI/CD pipelinesAI/ML concepts
Soft skills
collaborationcommunicationtroubleshootingperformance optimization