FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Platform Engineer – Training and Inference
SaviyntML Platform Engineer managing distributed training on Ray and LLM inference mesh at Saviynt. Building secure, scalable AI foundation to enhance identity products' outcomes.
Tech Stack
Tools & technologiesAssemblyCloudNode.jsPythonPyTorchRay
About the role
Key responsibilities & impact- Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
- Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
- Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
- Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
- Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
- Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS
- Operate the full model promotion lifecycle: quality gate → integration tests → load tests (k6) → shadow mode → A/B gate → canary (10%→100%) with golden-signal auto-rollback
- Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 − 2%), and automated Flyte DAG through to canary
- Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference
Requirements
What you’ll need- Experience in ML engineering with time in an ML platform or MLOps role
- Production Ray depth: Ray Train, Serve, Core, and Data — debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
- LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton — PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
- Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
- RL working knowledge: PPO, policy gradient, or RLHF — able to translate an algorithm into distributed compute primitives
- Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-rollback on golden signal degradation
- Vector databases: Pgvector or Qdrant — ANN index strategies, embedding upsert, and query latency tuning under inference load
- Strong Python and PyTorch; Flyte or equivalent ML orchestrator
- Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience
Benefits
Comp & perks- competitive total rewards package
- learning and tremendous opportunities to grow and advance in your career
- Saviynt discretionary bonus plan
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
RayKubeRayGKERay TrainTorchTrainerDDPNCCLRay ServePyTorchPython