Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Saviynt

AI Platform Engineer – Training and Inference

Saviynt

ML Platform Engineer managing distributed training on Ray and LLM inference mesh at Saviynt. Building secure, scalable AI foundation to enhance identity products' outcomes.

Posted 5/18/2026full-timeSan Francisco • California • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AssemblyCloudNode.jsPythonPyTorchRay

About the role

Key responsibilities & impact
  • Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
  • Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
  • Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
  • Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
  • Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
  • Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS
  • Operate the full model promotion lifecycle: quality gate → integration tests → load tests (k6) → shadow mode → A/B gate → canary (10%→100%) with golden-signal auto-rollback
  • Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 − 2%), and automated Flyte DAG through to canary
  • Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference

Requirements

What you’ll need
  • Experience in ML engineering with time in an ML platform or MLOps role
  • Production Ray depth: Ray Train, Serve, Core, and Data — debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
  • LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton — PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
  • Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
  • RL working knowledge: PPO, policy gradient, or RLHF — able to translate an algorithm into distributed compute primitives
  • Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-rollback on golden signal degradation
  • Vector databases: Pgvector or Qdrant — ANN index strategies, embedding upsert, and query latency tuning under inference load
  • Strong Python and PyTorch; Flyte or equivalent ML orchestrator
  • Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience

Benefits

Comp & perks
  • competitive total rewards package
  • learning and tremendous opportunities to grow and advance in your career
  • Saviynt discretionary bonus plan

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
RayKubeRayGKERay TrainTorchTrainerDDPNCCLRay ServePyTorchPython