
CUDA Kernel Engineer
Pragmatike
full-time
Posted on:
Location Type: Hybrid
Location: Cambridge • Florida • Illinois • United States
Visit company websiteExplore more
Tech Stack
About the role
- Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs, with a focus on maximizing occupancy, memory throughput, and warp efficiency.
- Profile GPU workloads using tools such as Nsight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK.
- Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, register pressure, and PCIe transfer overhead.
- Improve GPU memory pipelines (global, shared, L2, texture memory) and ensure proper memory coalescing.
- Collaborate closely with AI systems, model acceleration, and backend distributed systems teams.
- Contribute to GPU architecture decisions, kernel libraries, and internal performance-engineering best practices.
Requirements
- Proven track record building NVIDIA CUDA kernels from scratchnot just calling existing libraries.
- Strong ability to optimize kernels (tiling strategies, occupancy tuning, shared memory design, warp scheduling).
- Deep understanding of CUDA threads, warps, blocks, and grids, GPU memory hierarchy and memory coalescing, as well as warp divergence (how to detect, analyze, and mitigate it)
- Experience diagnosing PCIe bottlenecks and optimizing host-device transfers (pinned memory, streams, batching, overlap).
- Familiarity with C++, CUDA runtime APIs, and GPU debugging/profiling tooling.
Benefits
- Competitive salary & equity options
- Sign-on bonus
- Health, Dental, and Vision
- 401k
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
CUDAGPU optimizationkernel developmentmemory coalescingperformance analysisoccupancy tuningshared memory designPCIe optimizationC++GPU memory hierarchy
Soft Skills
collaborationproblem-solving