Salary
💰 $142,500 - $209,000 per year
Tech Stack
AirflowAWSAzureBigQueryCloudGoogle Cloud PlatformKafkaMatillionPulsarPythonSparkSQL
About the role
- Lead the architecture and development of end-to-end data platforms for agentic AI training, evaluation, and reinforcement in alignment with business goals and technology strategy.
- Build and manage scalable, real-time pipelines to ingest, transform, and store multimodal data (text, logs, user interactions, actions, environments).
- Design dynamic memory and knowledge storage systems for agents (vector stores, graph DBs, memory stores).
- Collaborate with AI/ML researchers to integrate live feedback, human-in-the-loop data, and auto-labeling systems into pipelines.
- Build industry connects and partner ecosystems to deliver Harman use cases and onboard best practices.
- Own data governance, quality, and security policies, coordinating with global data security.
- Build metadata tracking and observability tools for agent behavior and decision paths (data lineage, reproducibility, versioning).
- Mentor junior data engineers and contribute to a high-performance engineering culture.
- Stay ahead of cutting-edge trends in agentic AI infrastructure and incorporate them into the roadmap.
- Serve as subject matter expert and advisor on AI data engineering for internal and external stakeholders.
Requirements
- Graduate degree in computer science, engineering, or a related field (PhD a plus).
- 6+ years of experience in data engineering, including at least 2 years in a leadership or architect role.
- Advanced engineering and conceptual knowledge in data engineering and its implementation patterns using Matillion, Snowflake & MS Fabric.
- Strong programming expertise in Python and SQL.
- Deep experience with real-time streaming frameworks (Kafka, Pulsar, Spark Streaming, Flink).
- Proficient in building and maintaining data lakes and warehouses (e.g., Snowflake, Delta Lake, BigQuery, MS Fabric).
- Experience with LLMs, vector databases (Pinecone, FAISS), and agent memory systems.
- Strong knowledge of MLOps or agent lifecycle tooling (e.g., LangChain, MCP, AutoGen, MLflow).
- Experience in cloud-native data architecture (AWS/GCP/Azure), including orchestration (e.g. Airflow, dbt).
- Excellent communication and leadership skills with the ability to influence stakeholders.