Salary
💰 $151,500 - $215,500 per year
Tech Stack
ApacheAWSAzureCloudGoogle Cloud PlatformPandasPythonSpark
About the role
- Collect high-quality training data and curate datasets for supervised, unsupervised, and reinforcement learning use cases
- Build scalable featurization and preprocessing pipelines to transform raw data into structured inputs for AI/ML model development
- Partner with ML engineers and researchers to define data requirements and production workflows for LLM-based agents and autonomous AI systems
- Lead development of infrastructure enabling experimentation, evaluation, and deployment of machine learning models in production
- Support orchestration and real-time inference pipelines using Python and cloud-native tools, ensuring low-latency and high availability
- Mentor engineers and foster a high-performance, collaborative engineering culture
- Drive cross-functional alignment with product, infrastructure, and research stakeholders on progress, goals, and architecture
Requirements
- Strong software engineering background with deep experience in building data collection, transformation, and featurization pipelines at scale
- Proficiency in Python, including async programming and concurrency tools
- Experience with data-centric frameworks such as Pandas, Spark, or Apache Beam
- Familiarity with ML model development workflows and infrastructure, including dataset versioning, experiment tracking, and model evaluation
- Experience deploying and scaling AI systems in cloud environments such as AWS, GCP, or Azure
- Proven success operating in highly ambiguous environments such as research labs, startups, or fast-paced product teams
- Experience in startups, AI/ML research environments, or similarly dynamic settings is essential
- Track record of working with or alongside high-caliber peers in top engineering teams, research groups, or startup ecosystems
- Growth mindset, strong communication skills, and a commitment to inclusive collaboration and continuous learning