
Member of Technical Staff – Inference
Company Watch
full-time
Posted on:
Location Type: Hybrid
Location: Paris • France
Visit company websiteExplore more
Job Level
About the role
- Develop scalable, low-latency and cost effective inference pipelines
- Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms
- Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc.
- Collaborate with H research teams on model architectures to enhance efficiency during inference
- Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.)
- Prioritize and implement state-of-the-art inference techniques
Requirements
- MS or PhD in Computer Science, Machine Learning or related fields
- Proficient in at least one of the following programming languages: Python, Rust or C/C++
- Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc.
- Experience in model compression and quantization techniques
- Collaborative mindset, thriving in dynamic, multidisciplinary teams
- Strong communication and presentation skills
- Eager to explore new challenges
- Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc.
- Experience with CUDA kernel programming and NCCL
- Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.)
Benefits
- Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
- Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
- Enjoy a competitive salary
- Unlock opportunities for professional growth, continuous learning, and career development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonRustC/C++GPU programmingCUDAmodel compressionquantizationdeep learning inferencePytorchONNX Runtime
Soft Skills
collaborative mindsetstrong communication skillspresentation skillsdynamic teamworkeager to explore new challenges
Certifications
MS in Computer SciencePhD in Computer ScienceMachine Learning