Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Thomson Reuters

Lead Inference Platform Support Engineer – AI

Thomson Reuters

. Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning .

Posted 5/4/2026full-timeToronto • 🇨🇦 CanadaSenior💰 CA$140,000 - CA$175,000 per yearWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKubernetesMicroservicesPythonPyTorchTensorflow

About the role

Key responsibilities & impact
  • Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
  • Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
  • Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
  • Integrate models into production grade APIs supporting TR products and enterprise workflows
  • Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
  • Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
  • Build and optimize containerized inference pipelines using Kubernetes for large‑scale distributed workloads
  • Ensure compliance with TR’s AI standards for deployment, monitoring, governance, and drift detection
  • Profile inference performance, identify GPU/CPU bottlenecks, and optimize compute utilization across heterogeneous hardware
  • Implement observability and health monitoring for inference pipelines, ensuring reliability of enterprise AI services
  • Collaborate with platform teams to enhance capacity forecasting for AI workloads
  • Work with Product, Data Science, Architecture, and Enterprise AI teams to onboard new research models into production
  • Collaborates closely with AI engineers to invent new quantization techniques, improve numerical precision, and explore non‑standard architectures
  • Partner with Cloud Engineers (Azure, AWS, GCP) to develop guardrails and automation that support inference workload
  • Support the scale out of AI infrastructure during critical releases and global product rollouts.

Requirements

What you’ll need
  • Strong understanding of ML/LLM fundamentals and inference optimization techniques
  • Hands-on experience with GPU programming (CUDA preferred), inference runtimes (TensorRT, ONNX Runtime), and deep learning frameworks (PyTorch/TensorFlow)
  • Proficiency in Python and at least one systems language (C++ strongly preferred for performance critical inference paths)
  • Experience deploying AI workloads to AWS/GCP/Azure and Kubernetes
  • Familiarity with vector search systems (OpenSearch vectors) and retrieval augmented generation pipelines
  • Knowledge of distributed systems, microservices, CI/CD, and cloud native architecture
  • Experience with AI networks, such as CNNs, transformers, and diffusion model architectures, and their performance characteristics
  • Understanding of GPU, Multithreading and/or other accelerators with vectorized instructions
  • Specialized experience in one or more of the following machine learning/deep learning domains: Model compression, hardware aware model optimizations, hardware accelerators architecture, GPU/ASIC architecture, machine learning compilers, high performance computing, performance optimizations, numerics and SW/HW co-design.

Benefits

Comp & perks
  • Flexible vacation
  • Two company-wide Mental Health Days off
  • Access to the Headspace app
  • Retirement savings
  • Tuition reimbursement
  • Employee incentive programs
  • Resources for mental, physical, and financial wellbeing

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLM optimizationML model optimizationquantizationpruningdistillationGPU programmingCUDAinference runtimesTensorRTONNX Runtime
Soft Skills
collaborationproblem-solvingcommunicationcapacity forecastingperformance optimization