FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAnsibleCloudDNSGoGrafanaKubernetesLinuxOpenStackPythonTerraform
About the role
Key responsibilities & impact- Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
- Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
- Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale
Requirements
What you’ll need- Strong understanding of Kubernetes architecture, including CNI, CSI, operators, ingress/gateway, and control plane components.
- Hands-on experience operating and troubleshooting production Kubernetes clusters.
- Strong Linux and networking troubleshooting skills, including DNS, routing, firewalling, TLS, MTU, connectivity and performance issues.
- Ability to develop automation and operational tooling using Python, Go, or Bash.
- Experience with Terraform, Ansible, or similar IaC/configuration management tools.
- Experience with VictoriaMetrics/Grafana or similar monitoring, alerting, and troubleshooting tools.
- Strong experience with Git-based workflows and CI/CD pipelines.
- Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies.
- Hands-on operation or administration of Slurm clusters.
- Knowledge of Argo CD, GitOps workflows, Helm, or Helmfile.
- Background working with managed platforms, PaaS, or cloud services.
- Exposure to bare metal, GPU, HPC, or other high-performance computing environments.
- Familiarity with the NVIDIA GPU stack, RDMA/InfiniBand, or high-performance networking.
- Knowledge of OpenStack or similar cloud infrastructure platforms.
- Hands-on experience developing Kubernetes operators or controllers.
Benefits
Comp & perks- Competitive compensation
- Flexible working hours and hybrid or remote options, depending on your role
- Work from anywhere in the world for up to 45 days per year
- Private medical insurance for you and your family*
- Extra paid vacation and sick leave days*
- Support for life’s important moments and celebrations
- Language courses to help you connect and grow
- Modern, welcoming offices with snacks, drinks, and entertainment*
- Team sports and social activities*
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesLinux TroubleshootingNetworking SkillsAutomation ToolingGit WorkflowsSlurm AdministrationHigh-Performance ComputingNVIDIA GPU StackOpenStackCluster API
