AI Research Engineer, Kernel & Inference Optimization

Tether.to

AI Research Engineer driving innovation in model serving and inference architectures for advanced AI systems at Tether. Focusing on optimizing models for responsive, efficient, and scalable AI performance.

Posted 5/17/2026full-timeRemote • 🇪🇸 SpainMid-LevelSeniorWebsite

About the role

Key responsibilities & impact

Drive innovation in model serving and inference architectures for advanced AI systems
Focus on optimizing model deployment and inference strategies to deliver highly responsive, efficient, and scalable performance
Work on a wide spectrum of systems, ranging from resource-efficient models designed for limited hardware environments to complex, multi-modal architectures
Engineering robust inference pipelines, establishing comprehensive performance metrics, and identifying and resolving bottlenecks
Enable high-throughput, low-latency, low-memory footprint, and scalable AI performance that delivers tangible value in dynamic, real-world scenarios
Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage
Build, run, and monitor controlled inference tests in both simulated and live production environments
Track key performance indicators such as response latency, throughput, memory consumption, and error rates
Document iterative results and compare outcomes against established benchmarks
Identify and prepare high-quality test datasets and simulation scenarios tailored to real-world deployment challenges

Requirements

What you’ll need

A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
Must have knowledge of Metal Shading Language (MSL)
Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential
Deep understanding of modern model serving architectures and inference optimization techniques is required
Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones) as well as a deep understanding of model serving frameworks and engines
Practical experience in developing and deploying end-to-end inference pipelines
Demonstrated ability to apply empirical research to overcome challenges in model serving
Distributed Inference Systems: Designing and optimizing high-performance inference engines using techniques like Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism to handle massive models on GPU clusters.
Deep understanding of the math and structure behind Diffusion Models and Vision Transformers.

Benefits

Comp & perks

Health insurance
Work from anywhere
Professional development opportunities

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model serving architecturesinference optimizationMetal Shading Language (MSL)GPU kernelsend-to-end inference pipelinesTensor ParallelismPipeline ParallelismExpert ParallelismDiffusion ModelsVision Transformers

Soft Skills

innovationproblem-solvingempirical research applicationdocumentationperformance metrics analysis

Certifications

PhD in NLPPhD in Machine Learning