Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TMSfirst

Senior/Principal AI Performance Engineer

TMSfirst

Senior AI Engineer driving AI/ML innovation at CIQ. Responsible for AI engineering standards and deployment across the product portfolio.

Posted 5/5/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
CloudLinuxNode.jsPyTorch

About the role

Key responsibilities & impact
  • Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency.
  • Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion.
  • Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform.
  • Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead.
  • Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions.
  • Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies.
  • Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements.
  • Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure.
  • Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers.
  • Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods.
  • Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows.
  • Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball.
  • Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings.
  • Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike.
  • Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples.
  • Build and maintain AI-powered engineering tooling - leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization.
  • Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products.
  • Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap.
  • Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community.
  • Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment.
  • Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines.
  • Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale.
  • Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands.
  • Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer.
  • Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work.

Requirements

What you’ll need
  • Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management.
  • Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
  • Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent).
  • Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling.
  • Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, and the intersection of HPC with modern AI workloads.
  • Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks.
  • Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work.
  • Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data.
  • Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
  • A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern.

Benefits

Comp & perks
  • Medical, dental, and vision insurance.
  • Flexible paid time off.
  • Employee stock options.
  • Remote work; no travel required for most positions.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLM inference optimizationquantizationmodel pruningspeculative decodingcontinuous batchingkernel fusiondistributed AI trainingmixed-precision traininggradient checkpointingactivation recomputation
Soft Skills
analytical skillsstrong communication skillscollaborative mindsethumble mindsetcontinuous learningconfidence in AI engineering