FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesCloudLinuxNode.jsPyTorch
About the role
Key responsibilities & impact- Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency.
- Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion.
- Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform.
- Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead.
- Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions.
- Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies.
- Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements.
- Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure.
- Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers.
- Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods.
- Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows.
- Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball.
- Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings.
- Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike.
- Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples.
- Build and maintain AI-powered engineering tooling - leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization.
- Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products.
- Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap.
- Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community.
- Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment.
- Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines.
- Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale.
- Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands.
- Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer.
- Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work.
Requirements
What you’ll need- Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management.
- Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
- Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent).
- Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling.
- Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, and the intersection of HPC with modern AI workloads.
- Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks.
- Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work.
- Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data.
- Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
- A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern.
Benefits
Comp & perks- Medical, dental, and vision insurance.
- Flexible paid time off.
- Employee stock options.
- Remote work; no travel required for most positions.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLM inference optimizationquantizationmodel pruningspeculative decodingcontinuous batchingkernel fusiondistributed AI trainingmixed-precision traininggradient checkpointingactivation recomputation
Soft Skills
analytical skillsstrong communication skillscollaborative mindsethumble mindsetcontinuous learningconfidence in AI engineering
