HARMAN International

Principal Data Engineer, AI

HARMAN International

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $142,500 - $209,000 per year

Job Level

Lead

Tech Stack

AirflowAWSAzureBigQueryCloudGoogle Cloud PlatformKafkaMatillionPulsarPythonSparkSQL

About the role

  • Lead the architecture and development of end-to-end data platforms for agentic AI training, evaluation, and reinforcement in alignment with business goals and technology strategy.
  • Build and manage scalable, real-time pipelines to ingest, transform, and store multimodal data (text, logs, user interactions, actions, environments).
  • Design dynamic memory and knowledge storage systems for agents (vector stores, graph DBs, memory stores).
  • Collaborate with AI/ML researchers to integrate live feedback, human-in-the-loop data, and auto-labeling systems into pipelines.
  • Build industry connects and partner ecosystems to deliver Harman use cases and onboard best practices.
  • Own data governance, quality, and security policies, coordinating with global data security.
  • Build metadata tracking and observability tools for agent behavior and decision paths (data lineage, reproducibility, versioning).
  • Mentor junior data engineers and contribute to a high-performance engineering culture.
  • Stay ahead of cutting-edge trends in agentic AI infrastructure and incorporate them into the roadmap.
  • Serve as subject matter expert and advisor on AI data engineering for internal and external stakeholders.

Requirements

  • Graduate degree in computer science, engineering, or a related field (PhD a plus).
  • 6+ years of experience in data engineering, including at least 2 years in a leadership or architect role.
  • Advanced engineering and conceptual knowledge in data engineering and its implementation patterns using Matillion, Snowflake & MS Fabric.
  • Strong programming expertise in Python and SQL.
  • Deep experience with real-time streaming frameworks (Kafka, Pulsar, Spark Streaming, Flink).
  • Proficient in building and maintaining data lakes and warehouses (e.g., Snowflake, Delta Lake, BigQuery, MS Fabric).
  • Experience with LLMs, vector databases (Pinecone, FAISS), and agent memory systems.
  • Strong knowledge of MLOps or agent lifecycle tooling (e.g., LangChain, MCP, AutoGen, MLflow).
  • Experience in cloud-native data architecture (AWS/GCP/Azure), including orchestration (e.g. Airflow, dbt).
  • Excellent communication and leadership skills with the ability to influence stakeholders.