FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Infrastructure Engineer – GPU
PragmatikeAI Infrastructure Engineer responsible for GPU-powered infrastructure for AI workloads in a startup. Collaborating with teams to design and operate scalable ML inference platforms.
Tech Stack
Tools & technologiesDistributed SystemsPythonTerraform
About the role
Key responsibilities & impact- Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
- Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
- Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
- Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
- Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
- Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
- Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
- Define engineering best practices and contribute to platform scalability in a fast-moving startup environment
Requirements
What you’ll need- 4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
- Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
- Strong background in container orchestration and operating GPU-based workloads in production
- Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
- Proficiency in Python and infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
- Strong understanding of distributed systems, performance tuning, and production reliability engineering
- Ability to effectively use AI coding assistants to accelerate development and debugging workflows
- Ownership mindset with the ability to operate independently in a remote-first environment
- Experience with ML platforms such as Kubeflow, MLflow, or KubeAI (preferred)
- Knowledge of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference systems (preferred)
- Experience with cost optimization across different GPU types and inference workloads (preferred)
- Background in early-stage startups or greenfield infrastructure projects (preferred)
- Proven experience building production systems from scratch rather than maintaining legacy platforms (preferred).
Benefits
Comp & perks- Take ownership of critical infrastructure powering a rapidly scaling AI-native cloud platform
- Build foundational ML inference systems from the ground up in a high-growth, well-funded startup
- Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture
- Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems
- Influence core engineering decisions and define best practices that will scale with the company.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
ML OpsPlatform EngineeringSREmodel serving frameworkscontainer orchestrationPythoninfrastructure-as-codedistributed systemsperformance tuningproduction reliability engineering
Soft Skills
ownership mindsetindependent operationeffective communication