AI Research Engineer – Kernel & Inference Optimization

Tether.to

AI Research Engineer focusing on model serving and inference at Tether, contributing to advancements in AI systems and architecture. Collaborating with a global team in a dynamic fintech environment.

Posted 5/17/2026full-timeRemote • 🇦🇪 United Arab EmiratesMid-LevelSeniorWebsite

About the role

Key responsibilities & impact

Drive innovation in model serving and inference architectures for advanced AI systems
Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency
Ensure pipelines run efficiently across diverse environments
Establish clear performance targets
Build, run, and monitor controlled inference tests
Identify and prepare high-quality test datasets and simulation scenarios
Analyze computational efficiency and diagnose bottlenecks in the serving pipeline
Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines

Requirements

What you’ll need

A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field
Must have knowledge of Metal Shading Language (MSL)
Proven experience in low-level kernel optimizations and inference optimization on mobile devices
A deep understanding of modern model serving architectures and inference optimization techniques
Strong expertise in writing GPU kernels for mobile devices
Practical experience in developing and deploying end-to-end inference pipelines
Demonstrated ability to apply empirical research to overcome challenges in model serving
Distributed Inference Systems: Designing and optimizing high-performance inference engines

Benefits

Comp & perks

Work remotely from anywhere in the world
Opportunity to collaborate with a global team
Professional development opportunities to hone your skills

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Metal Shading Languagelow-level kernel optimizationsinference optimizationGPU kernelsend-to-end inference pipelinesmodel serving architecturesinference optimization techniquescomputational efficiency analysisperformance targets establishmentcontrolled inference tests

Soft Skills

innovationcross-functional collaborationproblem-solvinganalytical skillscommunication

Certifications

PhD in NLPPhD in Machine Learning