
Senior Machine Learning Engineer
Tri-global Solutions Group Inc.
full-time
Posted on:
Location Type: Hybrid
Location: Los Altos • California • United States
Visit company websiteExplore more
Salary
💰 $200,000 - $287,500 per year
Job Level
About the role
- Build and maintain machine learning infrastructure, including training pipelines, distributed compute systems, model serving platforms, and monitoring tools that research teams rely on daily.
- Integrate and evaluate large language models (LLMs) and foundation models by developing retrieval-augmented generation (RAG) systems, performing full and adapter-based fine-tuning, applying prompt engineering techniques, and benchmarking performance across providers such as AWS Bedrock, Gemini, Claude, GPT, and open-source models.
- Design scalable data pipelines to support multimodal data, including text, images, sensor data, speech, video, and structured scientific datasets.
- Consult with research teams to understand machine learning requirements, evaluate potential approaches, and propose solutions aligned with TRI’s technology stack and engineering standards.
- Support edge and embedded machine learning by optimizing, quantizing, and deploying models to onboard hardware platforms such as robotics systems and vehicles.
- Bridge the gap between research and production by translating experimental notebooks and prototypes into maintainable, scalable, and deployable systems while preserving research innovation.
- Stay current with advancements in machine learning and artificial intelligence by evaluating emerging techniques and assessing their potential adoption within TRI.
- Drive technical quality by participating in code reviews, producing clear documentation, and fostering knowledge sharing across the Research Software Engineering (RSE) team.
Requirements
- Candidates should have one of the following: a BS with 6–10 years, an MS with 5–9 years, a PhD with 3–7 years, or no degree with 9–13 years of equivalent experience; specific degree fields are flexible, with demonstrated experience prioritized over pedigree.
- Strong proficiency in PyTorch and/or TensorFlow, with hands-on experience building, fine-tuning (including adapter-based methods such as LoRA and QLoRA), evaluating, and deploying large language models.
- Experience working with multimodal data—including text, images, sensor/telemetry data, and speech—and understanding the associated data characteristics and pipeline requirements.
- Proven track record of deploying and maintaining machine learning systems in production environments.
- Experience with AWS services for machine learning workloads (e.g., Bedrock, SageMaker, ECS/Batch, S3), strong Python fundamentals, and comfort working within polyglot codebases.
- Ability to consult effectively with researchers, translate ambiguous technical requirements into actionable solutions, operate autonomously on cross-team problems, and communicate clearly in both written and verbal contexts.
- Experience optimizing models for resource-constrained hardware through quantization, pruning, and compilation frameworks (e.g., TFLite, LiteRT, ONNX), along with proficiency in C/C++ and/or CUDA for performance-critical inference.
- Familiarity with MLOps practices such as experiment tracking (MLflow, Weights & Biases), CI/CD for ML, and model versioning (e.g., DVC), as well as containerization (Docker required; ECS/Batch preferred; Kubernetes a plus), distributed training across multi-GPU and multi-node setups, and experience with Vertex AI in addition to AWS.
- Background in robotics, autonomous systems, materials science, or energy domains, with experience translating published research into production systems (“paper-to-production”), working in academic or industry R&D environments, and developing agentic AI systems with tool use and multi-step reasoning.
- AWS certifications (e.g., Solutions Architect, ML Specialty) and contributions to open-source machine learning projects.
Benefits
- medical, dental, and vision insurance
- 401(k) eligibility
- paid time off benefits (including vacation, sick time, and parental leave)
- annual cash bonus structure
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learning infrastructurelarge language modelsretrieval-augmented generationfine-tuningprompt engineeringdata pipelinesmodel optimizationquantizationC/C++CUDA
Soft Skills
consultingcommunicationautonomyproblem-solvingknowledge sharingtechnical documentationcode reviewscollaborationtranslating requirementsevaluating techniques
Certifications
AWS Solutions ArchitectAWS ML Specialty