Pragmatike

CUDA Kernel Engineer

Pragmatike

full-time

Posted on:

Location Type: Hybrid

Location: CambridgeFloridaIllinoisUnited States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs, with a focus on maximizing occupancy, memory throughput, and warp efficiency.
  • Profile GPU workloads using tools such as Nsight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK.
  • Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
  • Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
  • Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
  • Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.

Requirements

  • Proven track record building NVIDIA CUDA kernels from scratchnot just calling existing libraries.
  • Strong ability to optimize kernels (tiling strategies, occupancy tuning, shared memory design, warp scheduling).
  • Deep understanding of CUDA threads, warps, blocks, and grids, GPU memory hierarchy and memory coalescing, as well as warp divergence (how to detect, analyze, and mitigate it)
  • Experience diagnosing PCIe bottlenecks and optimizing host-device transfers (pinned memory, streams, batching, overlap).
  • Familiarity with C++, CUDA runtime APIs, and GPU debugging/profiling tooling.
Benefits
  • Competitive salary & equity options
  • Sign-on bonus
  • Health, Dental, and Vision
  • 401k
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
CUDAGPU optimizationkernel developmentmemory coalescingperformance analysisoccupancy tuningshared memory designPCIe optimizationC++GPU memory hierarchy
Soft Skills
collaborationproblem-solving