AI Infrastructure Engineer – GPU

Pragmatike

AI Infrastructure Engineer responsible for GPU-powered infrastructure for AI workloads in a startup. Collaborating with teams to design and operate scalable ML inference platforms.

Posted 4/22/2026full-timeRemote • 🇺🇦 UkraineMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

Distributed SystemsPythonTerraform

About the role

Key responsibilities & impact

Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
Define engineering best practices and contribute to platform scalability in a fast-moving startup environment

Requirements

What you’ll need

4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
Strong background in container orchestration and operating GPU-based workloads in production
Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
Proficiency in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
Strong understanding of distributed systems, performance tuning, and production reliability engineering
Ability to effectively use AI coding assistants to accelerate development and debugging workflows
Ownership mindset with the ability to operate independently in a remote-first environment
Experience with ML platforms such as Kubeflow, MLflow, or KubeAI (preferred)
Knowledge of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems (preferred)
Experience with cost optimization across different GPU types and inference workloads (preferred)
Background in early-stage startups or greenfield infrastructure projects (preferred)
Proven experience building production systems from scratch rather than maintaining legacy platforms (preferred).

Benefits

Comp & perks

Take ownership of critical infrastructure powering a rapidly scaling AI-native cloud platform
Build foundational ML inference systems from the ground up in a high-growth, well-funded startup
Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture
Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems
Influence core engineering decisions and define best practices that will scale with the company.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

ML OpsPlatform EngineeringSREmodel serving frameworkscontainer orchestrationPythoninfrastructure-as-codedistributed systemsperformance tuningproduction reliability engineering

Soft Skills

ownership mindsetindependent operationeffective communication