Apply faster with JobTailor
RecommendedApply
Apply your way
Use the standard apply link, or let JobTailor help you move faster.
- Apply directly in one click
- No setup required
- Best if you’re in a hurry
✨ Start AI Apply
Tech Stack
Tools & technologiesPythonPyTorch
About the role
Key responsibilities & impact- Design, implement, and optimize high‑performance inference pipelines for large language models running on GPUs
- Profile and tune model execution across the stack - from scheduler design to kernel fusions and everything in-between
- Design and experiment with memory management strategies for improved memory bandwidth optimization and cache efficiency
- Innovate and Implement cutting-edge techniques such as Speculative Decoding, Context Caching, and FP8/INT4 quantization to push the boundaries of tokens-per-second-per-watt
- Develop and maintain benchmarking and testing systems that quantify latency, utilization, and efficiency
Requirements
What you’ll need- Bachelor's, Master's, or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
- 5+ years of relevant software development experience
- Excellent Python programming skills, software design, and software engineering skills
- Experience working with deep learning frameworks like PyTorch and HuggingFace
- Experience profiling and debugging performance at all levels - Python runtime, PyTorch internals, and GPU utilization metrics
- Awareness of the latest developments in LLM architectures and LLM inference techniques
- Proactive and able to work without supervision
- Excellent written and oral communication skills in English
Benefits
Comp & perks- Competitive salaries
- Comprehensive benefits package
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Pythondeep learninginference pipelinesmemory managementbenchmarkingperformance profilingFP8 quantizationINT4 quantizationscheduler designkernel fusion
Soft Skills
proactivecommunicationindependent work
