Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Pragmatike

AI Infrastructure Engineer – GPU

Pragmatike

AI Infrastructure Engineer responsible for GPU-powered infrastructure for AI workloads in a startup. Collaborating with teams to design and operate scalable ML inference platforms.

Posted 4/22/2026full-timeRemote • 🇺🇦 UkraineMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
Distributed SystemsPythonTerraform

About the role

Key responsibilities & impact
  • Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
  • Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
  • Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
  • Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
  • Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
  • Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
  • Define engineering best practices and contribute to platform scalability in a fast-moving startup environment

Requirements

What you’ll need
  • 4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
  • Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
  • Strong background in container orchestration and operating GPU-based workloads in production
  • Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
  • Proficiency in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
  • Strong understanding of distributed systems, performance tuning, and production reliability engineering
  • Ability to effectively use AI coding assistants to accelerate development and debugging workflows
  • Ownership mindset with the ability to operate independently in a remote-first environment
  • Experience with ML platforms such as Kubeflow, MLflow, or KubeAI (preferred)
  • Knowledge of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems (preferred)
  • Experience with cost optimization across different GPU types and inference workloads (preferred)
  • Background in early-stage startups or greenfield infrastructure projects (preferred)
  • Proven experience building production systems from scratch rather than maintaining legacy platforms (preferred).

Benefits

Comp & perks
  • Take ownership of critical infrastructure powering a rapidly scaling AI-native cloud platform
  • Build foundational ML inference systems from the ground up in a high-growth, well-funded startup
  • Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture
  • Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems
  • Influence core engineering decisions and define best practices that will scale with the company.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
ML OpsPlatform EngineeringSREmodel serving frameworkscontainer orchestrationPythoninfrastructure-as-codedistributed systemsperformance tuningproduction reliability engineering
Soft Skills
ownership mindsetindependent operationeffective communication