FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
About the role
Key responsibilities & impact- Drive innovation in model serving and inference architectures
- Optimize model deployment and inference strategies
- Work on resource-efficient models for limited hardware
- Engineer robust inference pipelines
- Establish comprehensive performance metrics
- Identify and resolve bottlenecks in production environments
Requirements
What you’ll need- A degree in Computer Science or related field
- Ideally PhD in NLP, Machine Learning, or a related field
- Knowledge of Metal Shading Language (MSL)
- Proven experience in low-level kernel optimizations
- Strong expertise in writing GPU kernels for mobile devices
- Practical experience in developing and deploying end-to-end inference pipelines
- Deep understanding of modern model serving architectures
- Experience in Distributed Inference Systems and techniques like Tensor Parallelism
- Understanding of advanced optimization methods
Benefits
Comp & perks- Work remotely from anywhere in the world
- Opportunity to collaborate with global teams
- Cutting-edge projects in fintech
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
model serving architecturesinference strategiesresource-efficient modelsinference pipelinesperformance metricsMetal Shading Language (MSL)GPU kernelsend-to-end inference pipelinesDistributed Inference SystemsTensor Parallelism
Certifications
PhD in NLPPhD in Machine Learning
