Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Bright Vision Technologies

GPU Systems Engineer, CUDA

Bright Vision Technologies

GPU Systems Engineer developing high-performance CUDA applications for innovative solutions at Bright Vision Technologies. Collaborating with cross-functional teams to optimize GPU workloads across AI and HPC.

Posted 5/17/2026full-timeRemote • California • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
Node.jsPyTorch

About the role

Key responsibilities & impact
  • Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases.
  • Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers.
  • Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance.
  • Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
  • Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
  • Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
  • Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines.
  • Develop benchmarks and regression tests to safeguard performance over time.
  • Evaluate new GPU architectures and feature sets, and advise on adoption strategy.
  • Contribute to compiler-level optimizations for tensor programs where appropriate, working at the boundary between ML frameworks and underlying accelerator codegen to unlock performance not reachable through framework-level tuning alone.
  • Optimize memory hierarchy usage across HBM, L2, shared memory, and registers.
  • Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity within bounds acceptable for the target workloads.
  • Document performance characteristics, design decisions, and tuning playbooks for internal teams.
  • Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies.

Requirements

What you’ll need
  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
  • Six or more years of experience in GPU programming and performance engineering.
  • Deep expertise in CUDA C/C++ and GPU programming models.
  • Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
  • Hands-on experience profiling and optimizing GPU workloads in production.
  • Familiarity with NCCL, MPI, and high-performance interconnect technologies.
  • Experience integrating custom kernels into ML frameworks.
  • Strong C++ skills and familiarity with modern systems programming practices.
  • Solid grounding in linear algebra and numerical methods.
  • Strong communication and collaboration skills with research and engineering teams.

Benefits

Comp & perks
  • Competitive base salary commensurate with experience, plus benefits

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
CUDAC/C++GPU programmingperformance engineeringlinear algebramixed-precision computequantized computememory optimizationcustom kernelsML primitives
Soft Skills
communicationcollaboration