
Senior Infrastructure Engineer – AI/ML
Mitratech
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇩🇪 Germany
Visit company websiteJob Level
Senior
Tech Stack
AWSCloudDockerEC2GrafanaKubernetesLinuxPrometheusTerraform
About the role
- Design, deploy, and maintain scalable and secure infrastructure supporting AI and ML workloads.
- Build and maintain AWS cloud environments for compute (EC2, ECS/EKS, Lambda), storage (S3, EFS, FSx), and networking (VPC, Transit Gateway, PrivateLink, Route 53, load balancers).
- Implement security best practices using IAM, KMS, Secrets Manager, GuardDuty, and Security Hub.
- Support and optimize AI/ML workloads across AWS services (SageMaker, Bedrock, Batch, Step Functions).
- Develop and maintain Infrastructure as Code (IaC) using Terraform, AWS CDK, and CloudFormation.
- Manage containerized workloads and orchestration platforms (Docker, EKS, Fargate), including GPU scheduling and scaling.
- Set up and maintain monitoring and observability frameworks using CloudWatch and OpenTelemetry.
- Build and manage CI/CD pipelines (CircleCI, GitHub Actions, GitLab CI) for infrastructure automation and ML/Gen AI deployments.
- Collaborate with ML and Generative AI teams to scale models, optimize performance, and design efficient prompt or inference pipelines.
- Develop runbooks and SOPs for AI service deployment, troubleshooting, and performance optimization.
- Ensure security, compliance, and data protection across AI datasets and environments.
Requirements
- Strong proficiency in Linux administration and systems-level troubleshooting.
- Deep expertise in AWS cloud services, with experience in compute, storage, networking, and security domains.
- Proficiency in container orchestration (Kubernetes/EKS) and infrastructure automation tools.
- Hands-on experience with IaC tools such as Terraform, AWS CDK, or CloudFormation.
- Familiarity with monitoring, logging, and observability stacks (Prometheus, Grafana, OpenTelemetry).
- Experience implementing CI/CD pipelines for automated deployment and testing.
- Understanding of AI/ML concepts, including model deployment, inference scaling, and LLM performance tuning.
- Working knowledge of security best practices, IAM roles, encryption, and compliance controls.
- Excellent collaboration and communication skills to partner with ML engineers, data scientists, and product teams.
Benefits
- Equal-opportunity employer that values diversity at all levels
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSLinux administrationInfrastructure as CodeTerraformAWS CDKCloudFormationcontainer orchestrationKubernetesCI/CD pipelinesAI/ML concepts
Soft skills
collaborationcommunicationtroubleshootingperformance optimization