Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TMSfirst

Senior/Principal Performance Engineer

TMSfirst

Senior Performance Engineer at CIQ focused on AI-first performance engineering across product portfolio. Collaborates on benchmarking, regression analysis, and proactive performance improvements.

Posted 5/5/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
Linux

About the role

Key responsibilities & impact
  • Design, develop, and maintain comprehensive benchmarking frameworks spanning OS, kernel, and application layers.
  • Profile workloads across CPU, memory, I/O, network, and accelerator (GPU/NPU) subsystems to identify bottlenecks and optimization opportunities.
  • Establish and own performance baselines across CIQ's product and solutions portfolio.
  • Leverage AI-assisted tooling and agentic workflows to accelerate profiling, analysis, and root cause identification.
  • Build and maintain automated performance regression-detection pipelines integrated into CI/CD workflows using Fuzzball.
  • Identify, triage, and resolve regressions across user space, kernel space, and application layers with urgency and rigor.
  • Collaborate across engineering teams to root-cause regressions introduced by upstream kernel changes, compiler updates, or library modifications.
  • Drive proactive performance improvements - not just reactive fixes - to keep CIQ solutions ahead of the competition across every layer of the stack.
  • Own core operating system performance: kernel subsystem tuning (scheduler, memory management, I/O, networking), system call overhead reduction, and user space library and runtime optimizations.
  • Identify and implement kernel-level enhancements, including patches, configuration changes, and upstream contributions that yield measurable performance gains for CIQ's customer workloads.
  • Optimize for AI inference and training workloads, including LLM serving, model parallelism, and accelerator utilization.
  • Tune performance for HPC workloads, including modeling, simulation, and tightly coupled parallel applications (MPI, OpenMP, etc.).
  • Optimize general computing and service workloads - web services, databases, messaging systems, and other production software that runs on CIQ's OS platform.
  • Work at all levels of the stack: compiler flags, kernel parameters, scheduler tuning, NUMA topology, memory allocation, and application-level algorithmic improvements.
  • Champion an AI-first engineering philosophy - use AI tools, agents, and automation to accelerate your own productivity and the quality of performance insights.
  • Identify and prioritize optimization opportunities that directly impact AI training throughput and inference latency/cost.
  • Stay current on state-of-the-art techniques in ML system performance, including quantization, batching strategies, kernel fusion, and hardware-software co-design.
  • Develop deep expertise in CIQ's Fuzzball platform - its architecture, scheduling, and workload execution model.
  • Integrate performance benchmarks, regression tests, and user-facing workloads into Fuzzball-based pipelines.
  • Contribute to the performance characterization of Fuzzball itself, ensuring the platform adds minimal overhead and scales efficiently.
  • Develop broad familiarity with the full CIQ product portfolio — including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer (formerly Singularity), and Warewulf - understanding how performance considerations span and interconnect across each.
  • Collaborate deeply with the engineering teams behind each product line to surface, prioritize, and deliver performance improvements that benefit customers across the entire CIQ ecosystem.
  • Partner with product and customer success teams to translate real-world performance pain points into engineering priorities and measurable outcomes.
  • Document and communicate findings clearly - from low-level profiling data to executive-level summaries.
  • Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ's reputation for performance excellence.

Requirements

What you’ll need
  • A deep, principled understanding of operating system internals - Linux kernel scheduler, memory subsystem, I/O stack, and networking.
  • Proven experience identifying and resolving performance regressions across kernel and user space in production environments.
  • Hands-on expertise with profiling and tracing tools: perf, eBPF/bpftrace, Flamegraphs, VTune, Nsight, strace, ftrace, and similar.
  • Strong background in AI/ML workload performance - including inference optimization (TensorRT, ONNX, vLLM, or similar), training efficiency, and GPU/accelerator utilization.
  • Experience with HPC workloads: MPI, OpenMP, parallel filesystems, RDMA/InfiniBand, and job schedulers (Slurm, PBS, etc.).
  • Familiarity with modern AI-first development workflows and comfort using LLM-based tools to accelerate engineering work.
  • Experience building automated performance testing and regression detection pipelines in CI/CD environments.
  • Excellent analytical skills - able to form hypotheses, design experiments, and draw actionable conclusions from complex data.
  • Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
  • A collaborative, humble, and always-learning mindset - combined with the confidence to champion performance as a first-class engineering concern.

Benefits

Comp & perks
  • Medical, dental, and vision insurance.
  • Flexible paid time off.
  • Employee stock options.
  • Remote work; no travel required for most positions.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
operating system internalsLinux kernel schedulermemory subsystemI/O stacknetworkingperformance regression identificationprofiling toolsAI/ML workload performanceHPC workloadsautomated performance testing
Soft Skills
analytical skillsstrong communication skillscollaborative mindsethumble mindsetalways-learning mindset