Salary
💰 $151,500 - $215,500 per year
Tech Stack
ApacheAWSAzureCloudGoogle Cloud PlatformPandasPythonSpark
About the role
- Design and implement systems to collect and curate high-quality training datasets for supervised, unsupervised, and reinforcement learning use cases
- Build scalable featurization and preprocessing pipelines to transform raw data into structured inputs for AI/ML model development
- Partner with ML engineers and researchers to define data requirements and production workflows that support LLM-based agents and autonomous AI systems
- Lead the development of infrastructure that enables experimentation, evaluation, and deployment of machine learning models in production environments
- Support orchestration and real-time inference pipelines using Python and modern cloud-native tools, ensuring low-latency and high availability
- Mentor engineers and foster a high-performance, collaborative engineering culture grounded in technical excellence and curiosity
- Drive cross-functional alignment with product, infrastructure, and research stakeholders, ensuring clarity on progress, goals, and architecture
- Build scalable, performant systems to support model training and inference at scale
Requirements
- Strong software engineering background with deep experience in building data collection, transformation, and featurization pipelines at scale
- Proficiency in Python, including async programming and concurrency tools
- Experience with data-centric frameworks such as Pandas, Spark, or Apache Beam
- Familiarity with ML model development workflows and infrastructure (dataset versioning, experiment tracking, model evaluation)
- Experience deploying and scaling AI systems in cloud environments such as AWS, GCP, or Azure
- Proven success operating in highly ambiguous environments such as research labs, startups, or fast-paced product teams
- Track record of working with or alongside high-caliber peers in top engineering teams, research groups, or startup ecosystems
- Growth mindset, strong communication skills, and commitment to inclusive collaboration and continuous learning
- Experience with orchestration and real-time inference pipelines and modern cloud-native tools
- Experience mentoring engineers and fostering a collaborative engineering culture