
Member of Technical Staff, Inference
Genesis AI
full-time
Posted on:
Location Type: Hybrid
Location: Paris • France
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics
- Design and optimize distributed inference systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization
- Implement efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-level frameworks
- Optimize workloads for both throughput (batching, scheduling, quantization) and latency (caching, memory management, graph compilation)
- Develop monitoring and debugging tools to guarantee reliability, determinism, and rapid diagnosis of regressions across both stacks
Requirements
- Deep experience in distributed systems, ML infrastructure, or high-performance serving (8+ years)
- Production-grade expertise in Python, with strong background in systems languages (C++/Rust/Go)
- Low-level performance mastery: CUDA, Triton, kernel optimization, quantization, memory and compute scheduling
- Proven track record scaling inference workloads in both throughput-oriented cluster environments and latency-critical on-device deployments
- System-level mindset with a history of tuning hardware–software interactions for maximum efficiency, throughput, and responsiveness
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonC++RustGoCUDATritonkernel optimizationquantizationmemory schedulingcompute scheduling
Soft Skills
system-level mindsettuning hardware-software interactionsreliabilitydeterminismrapid diagnosis