Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tether.to

AI Research Engineer – Kernel & Inference Optimization

Tether.to

. Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage .

Posted 5/17/2026full-timeRemote • 🇮🇳 IndiaMid-LevelSeniorWebsite

About the role

Key responsibilities & impact
  • Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage
  • Ensure these pipelines run efficiently across diverse environments
  • Establish clear performance targets such as reduced latency, improved token response, and minimized memory footprint
  • Build, run, and monitor controlled inference tests in both simulated and live production environments
  • Track key performance indicators such as response latency, throughput, memory consumption, and error rates
  • Document iterative results and compare outcomes against established benchmarks
  • Identify and prepare high-quality test datasets and simulation scenarios
  • Analyze computational efficiency and diagnose bottlenecks in the serving pipeline
  • Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines

Requirements

What you’ll need
  • A degree in Computer Science or related field
  • Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
  • Must have knowledge of Metal Shading Language (MSL)
  • Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential
  • A deep understanding of modern model serving architectures and inference optimization techniques
  • Must have strong expertise in writing GPU kernels for mobile devices (i.e., smartphones)
  • Practical experience in developing and deploying end-to-end inference pipelines
  • Demonstrated ability to apply empirical research to overcome challenges in model serving
  • Distributed Inference Systems: Designing and optimizing high-performance inference engines

Benefits

Comp & perks
  • Professional development opportunities
  • Working remotely from every corner of the world

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
model serving architecturesinference optimizationMetal Shading LanguageGPU kernelslow-level kernel optimizationsend-to-end inference pipelinescomputational efficiency analysisperformance targetscontrolled inference testshigh-performance inference engines
Soft Skills
cross-functional collaborationanalytical skillsproblem-solvingdocumentationcommunication
Certifications
PhD in NLPPhD in Machine Learning