AI Research Engineer – Kernel & Inference Optimization

Tether.to

AI Research Engineer at Tether focusing on optimizing model serving and inference architectures. Collaborate globally to develop cutting-edge fintech solutions.

Posted 5/17/2026full-timeRemote • 🇮🇹 ItalyMid-LevelSeniorWebsite

About the role

Key responsibilities & impact

Drive innovation in model serving and inference architectures
Optimize model deployment and inference strategies
Work on resource-efficient models for limited hardware
Engineer robust inference pipelines
Establish comprehensive performance metrics
Identify and resolve bottlenecks in production environments

Requirements

What you’ll need

A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field
Knowledge of Metal Shading Language (MSL)
Proven experience in low-level kernel optimizations
Strong expertise in writing GPU kernels for mobile devices
Practical experience in developing and deploying end-to-end inference pipelines
Deep understanding of modern model serving architectures
Experience in Distributed Inference Systems and techniques like Tensor Parallelism
Understanding of advanced optimization methods

Benefits

Comp & perks

Work remotely from anywhere in the world
Opportunity to collaborate with global teams
Cutting-edge projects in fintech

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model serving architecturesinference strategiesresource-efficient modelsinference pipelinesperformance metricsMetal Shading Language (MSL)GPU kernelsend-to-end inference pipelinesDistributed Inference SystemsTensor Parallelism

Certifications

PhD in NLPPhD in Machine Learning