AI Research Engineer – Kernel & Inference Optimization

Tether.to

AI Research Engineer responsible for optimizing model serving and inference architectures. Join Tether to innovate in the fintech space remotely from India.

Posted 5/17/2026full-timeRemote • 🇮🇳 IndiaMid-LevelSeniorWebsite

About the role

Key responsibilities & impact

Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage
Ensure these pipelines run efficiently across diverse environments
Establish clear performance targets such as reduced latency, improved token response, and minimized memory footprint
Build, run, and monitor controlled inference tests in both simulated and live production environments
Track key performance indicators such as response latency, throughput, memory consumption, and error rates
Document iterative results and compare outcomes against established benchmarks
Identify and prepare high-quality test datasets and simulation scenarios
Analyze computational efficiency and diagnose bottlenecks in the serving pipeline
Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines

Requirements

What you’ll need

A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
Must have knowledge of Metal Shading Language (MSL)
Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential
A deep understanding of modern model serving architectures and inference optimization techniques
Must have strong expertise in writing GPU kernels for mobile devices (i.e., smartphones)
Practical experience in developing and deploying end-to-end inference pipelines
Demonstrated ability to apply empirical research to overcome challenges in model serving
Distributed Inference Systems: Designing and optimizing high-performance inference engines

Benefits

Comp & perks

Professional development opportunities
Working remotely from every corner of the world

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model serving architecturesinference optimizationMetal Shading LanguageGPU kernelslow-level kernel optimizationsend-to-end inference pipelinescomputational efficiency analysisperformance targetscontrolled inference testshigh-performance inference engines

Soft Skills

cross-functional collaborationanalytical skillsproblem-solvingdocumentationcommunication

Certifications

PhD in NLPPhD in Machine Learning