Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tether.to

AI Research Engineer – Kernel & Inference Optimization

Tether.to

. Drive innovation in model serving and inference architectures for advanced AI systems.

Posted 5/19/2026full-timeRemote • 🏈 Anywhere in North AmericaMid-LevelSeniorWebsite

About the role

Key responsibilities & impact
  • Drive innovation in model serving and inference architectures for advanced AI systems.
  • Focus on optimizing model deployment and inference strategies.
  • Work on a wide spectrum of systems, from resource-efficient models to complex, multi-modal architectures.
  • Develop, test, and implement novel serving strategies and inference algorithms.
  • Engineer robust inference pipelines, establish performance metrics, and resolve bottlenecks in production environments.
  • Enable high-throughput, low-latency, low-memory footprint, and scalable AI performance that delivers tangible value.

Requirements

What you’ll need
  • A degree in Computer Science or related field.
  • Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
  • Must have knowledge of Metal Shading Language (MSL).
  • Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential.
  • Your contributions should have led to measurable improvements in inference latency, throughput, and memory footprint for domain-specific applications, particularly on resource-constrained devices and edge platforms.
  • A deep understanding of modern model serving architectures and inference optimization techniques is required.
  • Strong expertise in writing GPU kernels for mobile devices (i.e., smartphones).
  • Practical experience in developing and deploying end-to-end inference pipelines, from optimizing models for efficient serving to integrating these solutions on resource-constrained devices is required.
  • Demonstrated ability to apply empirical research to overcome challenges in model serving, such as latency optimization, computational bottlenecks, and memory constraints.
  • Proficient in designing robust evaluation frameworks and iterating on optimization strategies to continuously push the boundaries of inference performance and system efficiency.
  • Distributed Inference Systems: Designing and optimizing high-performance inference engines using techniques like Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism to handle massive models on GPU clusters.
  • Deep understanding of the math and structure behind Diffusion Models and Vision Transformers.

Benefits

Comp & perks
  • Health insurance
  • Flexible working hours
  • Paid time off
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
model serving architecturesinference optimizationMetal Shading Language (MSL)low-level kernel optimizationsGPU kernelsend-to-end inference pipelineslatency optimizationevaluation frameworksTensor ParallelismPipeline Parallelism
Soft Skills
innovationproblem-solvingempirical research applicationperformance metrics establishmentcollaboration
Certifications
PhD in NLPPhD in Machine Learning