Principal Data Engineer, AI

HARMAN International

full-time

Posted on: 8/27/2025

Location: 🇺🇸 United States

✨ AI Apply

💰 $142,500 - $209,000 per year

Lead

AirflowAWSAzureBigQueryCloudGoogle Cloud PlatformKafkaMatillionPulsarPythonSparkSQL

About the role

Lead the architecture and development of end-to-end data platforms for agentic AI training, evaluation, and reinforcement in alignment with business goals and technology strategy.
Build and manage scalable, real-time pipelines to ingest, transform, and store multimodal data (text, logs, user interactions, actions, environments).
Design dynamic memory and knowledge storage systems for agents (vector stores, graph DBs, memory stores).
Collaborate with AI/ML researchers to integrate live feedback, human-in-the-loop data, and auto-labeling systems into pipelines.
Build industry connects and partner ecosystems to deliver Harman use cases and onboard best practices.
Own data governance, quality, and security policies, coordinating with global data security.
Build metadata tracking and observability tools for agent behavior and decision paths (data lineage, reproducibility, versioning).
Mentor junior data engineers and contribute to a high-performance engineering culture.
Stay ahead of cutting-edge trends in agentic AI infrastructure and incorporate them into the roadmap.
Serve as subject matter expert and advisor on AI data engineering for internal and external stakeholders.

Graduate degree in computer science, engineering, or a related field (PhD a plus).
6+ years of experience in data engineering, including at least 2 years in a leadership or architect role.
Advanced engineering and conceptual knowledge in data engineering and its implementation patterns using Matillion, Snowflake & MS Fabric.
Strong programming expertise in Python and SQL.
Deep experience with real-time streaming frameworks (Kafka, Pulsar, Spark Streaming, Flink).
Proficient in building and maintaining data lakes and warehouses (e.g., Snowflake, Delta Lake, BigQuery, MS Fabric).
Experience with LLMs, vector databases (Pinecone, FAISS), and agent memory systems.
Strong knowledge of MLOps or agent lifecycle tooling (e.g., LangChain, MCP, AutoGen, MLflow).
Experience in cloud-native data architecture (AWS/GCP/Azure), including orchestration (e.g. Airflow, dbt).
Excellent communication and leadership skills with the ability to influence stakeholders.