Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Cerebras Systems

Senior Kernel Optimization Engineer

Cerebras Systems

Kernel Engineer developing high-performance software for cutting-edge AI workloads at Cerebras Systems. Focus on optimizing and scaling deep learning operations for a massively parallel processor architecture.

Posted 5/21/2026full-timeRemote • California • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AssemblyPythonPyTorchTensorflow

About the role

Key responsibilities & impact
  • Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms.
  • Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system.
  • Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system.
  • Using mathematical models and analysis to measure the software performance and inform design decisions.
  • Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries.
  • Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks.
  • Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems.

Requirements

What you’ll need
  • Bachelor’s, Master’s, PhD or foreign equivalents in Computer Science, Computer Engineering, Mathematics, or related fields.
  • Understanding of hardware architecture concepts — must be comfortable learning the details of a new hardware architecture.
  • Skilled in C++ and Python programming languages.
  • Good knowledge of library and/or API development best practices.
  • Strong debugging skills and knowledge of debugging complex software stack.
  • Experience in kernel development and/or testing.
  • Familiarity with parallel algorithms and distributed memory systems.
  • Experience in programming accelerators such as GPUs and FPGAs.
  • Familiarity with Machine Learning neural networks and frameworks such as TensorFlow and PyTorch.
  • Familiarity with HPC kernels and their optimization.

Benefits

Comp & perks
  • Build a breakthrough AI platform beyond the constraints of the GPU.
  • Publish and open source their cutting-edge AI research.
  • Work on one of the fastest AI supercomputers in the world.
  • Enjoy job stability with startup vitality.
  • Our simple, non-corporate work culture that respects individual beliefs.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
C++Pythonlow-level assemblydomain specific languagekernel developmentdebuggingparallel algorithmsdistributed memory systemsHPC kernels optimizationMachine Learning
Soft Skills
problem-solvinganalytical thinkingcommunicationcollaboration
Certifications
Bachelor’s degreeMaster’s degreePhD