Senior Data Scientist, Machine Learning Data Operations

TurbineOne

full-time

Posted on: 9/12/2025

Origin: • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

NumpyPandasPythonPyTorchScikit-LearnSparkSQL

About the role

Ingesting, organizing, and maintaining large-scale training datasets from open-source resources and contract-specific artifacts
Creating and managing data cataloging systems to ensure datasets are findable, accessible, and ready for ML training pipelines
Designing and implementing data labeling workflows, including managing external labeling vendors and quality assurance processes
Building and maintaining YOLO-style manifests and annotation formats for custom computer vision datasets
Performing data cleaning, validation, and augmentation to ensure high-quality training data
Conducting exploratory data analysis and generating insights about dataset characteristics, biases, and coverage gaps
Supporting the ML research team with statistical analysis, experiment design, and model evaluation
Developing data pipelines and automation tools for continuous data ingestion and processing
Collaborating with ML engineers to optimize data loading and preprocessing for training efficiency
Process incoming datasets from various sources, performing quality checks and organizing them into our data management system
Create or review annotation schemas and coordinate with labeling teams to ensure consistent, high-quality labels
Write Python scripts to clean, transform, and validate datasets for specific ML training requirements
Analyze dataset statistics and create visualizations to identify potential issues or opportunities for improvement
Collaborate with the ML research lead to design experiments and evaluate model performance across different data splits
Document dataset characteristics, versioning, and lineage to maintain reproducibility and compliance

Requirements

High standard of ethics, grit, integrity and moral character
5+ years of experience in data science, analytics, or related field with focus on ML data preparation
Strong foundation in probability, statistics, and experimental design
Bachelor's degree in Statistics, Mathematics, Computer Science, or related quantitative field (Master's preferred)
Proficiency with Python data stack: Pandas, NumPy, Jupyter Notebooks, and data visualization libraries
Experience with ML frameworks (PyTorch, Scikit-learn) and familiarity with training workflows
Hands-on experience with computer vision datasets and annotation formats (COCO, YOLO, Pascal VOC)
Experience managing data labeling projects and working with annotation tools (Label Studio, CVAT, or similar)
Familiarity with open-source ML models and experience applying them to real-world problems
Strong SQL skills and experience with data warehousing concepts
Experience with version control (Git) and collaborative development practices
Excellent communication skills for coordinating with technical and non-technical stakeholders
Meticulous attention to detail and strong organizational skills for managing complex datasets
Willingness to embrace the Startup Culture of moving fast, being insatiably curious, celebrating often, embracing uncertainty, and having a personal desire to improve other peoples’ lives
Must be eligible to obtain a clearance with the U.S. government

Senior Data Scientist, Machine Learning Data Operations

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Senior Data Scientist

Machine Learning Engineer

Staff Data Scientist, Search and Personalization

Senior Data Scientist

AI/ML Architect, Technical Team Lead