Staff Data Engineer, AI – Robotics

General Motors

Staff Data Engineer in AI & Robotics developing scalable robot learning infrastructure. Collaborating across teams to establish data infrastructure standards for robotics AI.

Posted 5/27/2026full-timeWarren • Missouri • 🇺🇸 United StatesLeadWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

PythonC++ScalaJavaMLOpsdata capturedata ingestiondata servingdataset versioningexperiment tracking

Soft Skills

leadershipmentoringcross-functional collaborationinfluencecommunicationengineering disciplineproblem-solvingorganizational skillsstrategic thinkingadaptability

Tools & Technologies

ROS 2DVCMLflowCI/CDCT pipelinesdata governance toolsobservability toolsrobotics systemsdata logging frameworksmetadata management

Industry Keywords

multimodal roboticsdata infrastructurerobot learning workflowsdataset lifecycle managementmodel trainingproduction reliabilitydata engineeringML infrastructuresystem designreal-world robotic logging

Tech Stack

Tools & technologies

JavaPythonScala

About the role

Key responsibilities & impact

Define and drive the technical vision for multimodal robotics data infrastructure spanning vision, depth, force/torque, joint states, events, and metadata across lab and plant-adjacent environments.
Architect and scale reliable data capture, ingestion, and serving pipelines that support robot learning workflows from experimentation through production deployment.
Establish reproducible data logging and replay frameworks, including ROS 2 bagging where applicable, to enable debugging, regression testing, root-cause analysis, and dataset creation at scale.
Own the strategy for dataset lifecycle management, including versioning, lineage, provenance, governance, retention, and quality gates, to support trustworthy model training and evaluation.
Lead the integration of experiment tracking, model/data traceability, and auditability patterns so teams can compare runs, reproduce results, and understand system changes over time.
Design and implement MLOps automation patterns, including CI/CD/CT-style pipelines for ML systems, that reduce manual effort and improve deployment confidence for robotics AI updates.
Partner with AI/ML, planning, validation, and plant teams to define data contracts such as schemas, labeling standards, and failure taxonomies, and convert field failures into curated training datasets and measurable learning loops.
Influence architecture across adjacent systems and mentor engineers on best practices in data engineering, ML infrastructure, observability, and production reliability.
Drive cross-functional technical decisions, balancing research velocity with platform robustness, governance, and long-term maintainability.

Requirements

What you’ll need

B.S. or M.S. in Computer Science, Computer Engineering, Data Engineering, or a related field.
8+ years of experience building production data systems and/or ML infrastructure, including practical experience supporting training pipelines end-to-end.
Strong proficiency in Python and at least one of: C++, Scala, or Java.
Demonstrated engineering discipline in testing, documentation, system design, and operational reliability.
Experience with dataset versioning, lineage, and reproducibility tooling such as DVC or equivalent approaches.
Experience with experiment tracking and model registry patterns such as MLflow or equivalent tools.
Experience designing technical systems that support multiple stakeholders and use cases, with the ability to influence architecture beyond an individual project.
Ability to work onsite with hardware and robotics teams, and to design pipelines that handle real-world robotic logging constraints such as bandwidth limits, dropped frames, and timing drift.

Benefits

Comp & perks

From day one, we're looking out for your well-being–at work and at home–so you can focus on realizing your ambitions.