DevOps Engineer – AI Inference

Gcore

DevOps Engineer designing and maintaining AI inference workloads infrastructure at Gcore. Join a global team to deliver AI-driven solutions in a secure environment.

Posted 7/4/2026full-timeRemote • 🇸🇬 SingaporeMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleCloudDNSGoGrafanaKubernetesLinuxOpenStackPythonTerraform

About the role

Key responsibilities & impact

Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale

Requirements

What you’ll need

Strong understanding of Kubernetes architecture, including CNI, CSI, operators, ingress/gateway, and control plane components.
Hands-on experience operating and troubleshooting production Kubernetes clusters.
Strong Linux and networking troubleshooting skills, including DNS, routing, firewalling, TLS, MTU, connectivity and performance issues.
Ability to develop automation and operational tooling using Python, Go, or Bash.
Experience with Terraform, Ansible, or similar IaC/configuration management tools.
Experience with VictoriaMetrics/Grafana or similar monitoring, alerting, and troubleshooting tools.
Strong experience with Git-based workflows and CI/CD pipelines.
Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies.
Hands-on operation or administration of Slurm clusters.
Knowledge of Argo CD, GitOps workflows, Helm, or Helmfile.
Background working with managed platforms, PaaS, or cloud services.
Exposure to bare metal, GPU, HPC, or other high-performance computing environments.
Familiarity with the NVIDIA GPU stack, RDMA/InfiniBand, or high-performance networking.
Knowledge of OpenStack or similar cloud infrastructure platforms.
Hands-on experience developing Kubernetes operators or controllers.

Benefits

Comp & perks

Competitive compensation
Flexible working hours and hybrid or remote options, depending on your role
Work from anywhere in the world for up to 45 days per year
Private medical insurance for you and your family*
Extra paid vacation and sick leave days*
Support for life’s important moments and celebrations
Language courses to help you connect and grow
Modern, welcoming offices with snacks, drinks, and entertainment*
Team sports and social activities*

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesLinux TroubleshootingNetworking SkillsAutomation ToolingGit WorkflowsSlurm AdministrationHigh-Performance ComputingNVIDIA GPU StackOpenStackCluster API