
Principal AI Data Engineer
Honeywell
full-time
Posted on:
Location Type: Hybrid
Location: Phoenix • Arizona • United States
Visit company websiteExplore more
Job Level
About the role
- Support end‑to‑end data needs for all AI modalities, including classic ML, GenAI/LLMs, and agentic AI systems
- Build robust, scalable data pipelines for structured, semi‑structured, and unstructured data, including text, documents, images, audio, video, and logs
- Develop feature engineering pipelines for classic ML, including feature extraction, transformation, and feature store management
- Build and optimize GenAI and LLM data pipelines, including embedding generation, vectorization, chunking, metadata extraction, and document enrichment for RAG and context retrieval
- Develop data ingestion and orchestration workflows that support agentic AI, including memory stores, event-driven pipelines, tool-use data flows, and real-time retrieval services
- Design and implement advanced data solutions using AWS (S3, Glue, Lambda, EMR, Kinesis), Databricks (Spark, Delta Lake, Vector Search), and Dataiku to enable intelligent systems at scale
- Implement data governance, quality, lineage, monitoring, and observability to support high-performance, trustworthy AI
- Partner with data scientists, ML engineers, and AI product teams to deliver datasets for model development, fine‑tuning, evaluation, and production inference
- Optimize pipelines for latency, cost, reliability, and throughput, ensuring AI systems—from batch ML to real-time agents—have the data they need
Requirements
- Bachelor’s degree in a technical field (CS, Engineering, Math, or related)
- Experience supporting AI at scale across classic ML, GenAI/LLM, and agentic AI systems
- Experience with vector databases and semantic search (Databricks Vector Search, Pinecone, FAISS, Milvus, OpenSearch)
- Familiarity with LLM and GenAI data preparation, including:
- Text processing
- Tokenization
- Chunking strategies
- Prompt/context formatting
- Experience with unstructured data technologies (OCR, NLP pipelines, computer vision data processing)
- Hands-on experience with Dataiku for automation, workflow orchestration, and AI project management
- Knowledge of MLOps tooling: MLflow, Delta Lake, experiment tracking, CI/CD for ML
- Understanding of agentic AI system patterns, such as memory architectures, tool APIs, event-driven workflows, and reasoning chain data requirements
- Strong analytical mindset, attention to detail, and commitment to high data quality
- Ability to thrive in a fast-paced, evolving AI environment and collaborate across cross-functional teams
Benefits
- employer-subsidized Medical, Dental, Vision, and Life Insurance
- Short-Term and Long-Term Disability
- 401(k) match
- Flexible Spending Accounts
- Health Savings Accounts
- EAP
- Educational Assistance
- Parental Leave
- Paid Time Off (for vacation, personal business, sick time, and parental leave)
- 12 Paid Holidays
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data pipelinesfeature engineeringembedding generationvectorizationmetadata extractiondata ingestionorchestration workflowsdata governanceunstructured data technologiesMLOps
Soft Skills
analytical mindsetattention to detailcollaborationadaptability