Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Gcore

DevOps Engineer – AI Inference

Gcore

DevOps Engineer designing and maintaining AI inference workloads infrastructure at Gcore. Join a global team to deliver AI-driven solutions in a secure environment.

Posted 7/4/2026full-timeRemote • 🇸🇬 SingaporeMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AnsibleCloudDNSGoGrafanaKubernetesLinuxOpenStackPythonTerraform

About the role

Key responsibilities & impact
  • Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
  • Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
  • Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale

Requirements

What you’ll need
  • Strong understanding of Kubernetes architecture, including CNI, CSI, operators, ingress/gateway, and control plane components.
  • Hands-on experience operating and troubleshooting production Kubernetes clusters.
  • Strong Linux and networking troubleshooting skills, including DNS, routing, firewalling, TLS, MTU, connectivity and performance issues.
  • Ability to develop automation and operational tooling using Python, Go, or Bash.
  • Experience with Terraform, Ansible, or similar IaC/configuration management tools.
  • Experience with VictoriaMetrics/Grafana or similar monitoring, alerting, and troubleshooting tools.
  • Strong experience with Git-based workflows and CI/CD pipelines.
  • Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies.
  • Hands-on operation or administration of Slurm clusters.
  • Knowledge of Argo CD, GitOps workflows, Helm, or Helmfile.
  • Background working with managed platforms, PaaS, or cloud services.
  • Exposure to bare metal, GPU, HPC, or other high-performance computing environments.
  • Familiarity with the NVIDIA GPU stack, RDMA/InfiniBand, or high-performance networking.
  • Knowledge of OpenStack or similar cloud infrastructure platforms.
  • Hands-on experience developing Kubernetes operators or controllers.

Benefits

Comp & perks
  • Competitive compensation
  • Flexible working hours and hybrid or remote options, depending on your role
  • Work from anywhere in the world for up to 45 days per year
  • Private medical insurance for you and your family*
  • Extra paid vacation and sick leave days*
  • Support for life’s important moments and celebrations
  • Language courses to help you connect and grow
  • Modern, welcoming offices with snacks, drinks, and entertainment*
  • Team sports and social activities*

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesLinux TroubleshootingNetworking SkillsAutomation ToolingGit WorkflowsSlurm AdministrationHigh-Performance ComputingNVIDIA GPU StackOpenStackCluster API