Salary
💰 $260,000 - $280,000 per year
About the role
- Establish foundational design patterns and infrastructure practices that ensure systems are scalable, resilient, secure, and maintainable
- Work hands-on with Python and ML frameworks like PyTorch to prototype, optimize, and guide implementation of complex model pipelines and platform components
- Align high-impact technical decisions with long-term product strategy by partnering with platform, product, and clinical teams
- Architect and evolve robust ML infrastructure to support continuous training, real-time inference, evaluation, and observability
- Lead cross-functional initiatives that simplify system complexity, improve developer velocity, and increase organizational leverage
- Shape quality strategies across teams by defining standards for testing, observability, performance, and operational risk
- Mentor senior and staff engineers across squads, supporting their growth as systems thinkers and technical leaders
- Champion a culture of sustainable speed, system ownership, and architectural clarity across product and platform teams
Requirements
- 10+ years of experience in Machine Learning and/or Engineering, with a strong track record of technical leadership and system architecture across teams or business areas
- Expert-level proficiency in Python and advanced ML frameworks such as PyTorch or TensorFlo
- Demonstrated success in designing, scaling, and evolving ML platforms that support production-grade training, deployment, real-time inference, and monitoring
- Deep experience in LLMs and NLP, including transformer models, summarization, conversational agents, and embedding-based retrieval
- Advanced understanding of ML infrastructure, including continuous training, system observability, performance optimization, and platform reliability
- Experience with ML infrastructure tools such as Databricks, MLFlow, SageMaker, or equivalent
- A Master’s degree or PhD in Computer Science, Machine Learning, or a related field (not strictly required)
- Exceptional communication and cross-org leadership skills, with the ability to align diverse technical and non-technical stakeholders around ML strategy and long-term architecture