FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Kernel Developer
Expert Executive Recruiters (EER Global)Kernel Developer designing and optimizing high-performance user-space compute kernels for AI accelerators in C/C++. Focus on latency, throughput, and efficiency at the hardware–software edge.
About the role
Key responsibilities & impact- Design and implement high-performance compute (operator) kernels in C/C++
- Develop core tensor operations
- Optimize performance for AI accelerators (latency, throughput, and efficiency)
- Apply low-level optimization techniques
- Profile, benchmark, and tune kernels to eliminate performance bottlenecks
- Contribute to internal libraries and runtime systems for AI workloads
Requirements
What you’ll need- Strong proficiency in C/C++
- Experience with performance-critical software development
- Strong understanding of low-level optimization techniques
- Understanding of CPU/GPU or accelerator architecture fundamentals
- Ability to analyze and debug complex systems
- Experience working with large, complex codebases
- Strong communication and teamwork skills
- Nice to Have: Experience with GPU kernel programming (CUDA / ROCm / OpenCL)
- Nice to Have: Experience with Triton or similar kernel programming frameworks
- Nice to Have: Knowledge of instruction set architectures (ISA)
- Nice to Have: Familiarity with compiler technologies (e.g., LLVM-based stacks)
- Nice to Have: Experience with distributed communication frameworks (NCCL, MPI, libfabric)
- Nice to Have: Understanding of deep learning models
Benefits
Comp & perks- Highly competitive salary
- Employment contract (Umowa o Pracę)
- Comprehensive benefits package
- Medicover healthcare coverage
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
CC++performance optimizationlow-level optimization techniquesGPU kernel programmingCUDAROCmOpenCLTritoninstruction set architectures
Soft Skills
communicationteamworkanalysisdebugging