Lead Data Engineer

Honeywell

Lead Data Engineer architecting data foundations for AI and IoT telemetry at Honeywell. Collaborate on innovative AI solutions while mentoring a team of engineers in a hybrid work environment.

Posted 5/26/2026full-timeAtlanta • 🇺🇸 United StatesSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

data engineeringmedallion lakehouse architectureApache SparkPySparkAzure DatabricksApache KafkaAzure Event Hubdata modelingMLOpsreal-time data processing

Soft Skills

leadershipmentoringcollaborationtechnical coachingstakeholder engagementcommunicationproblem-solvingdecision-makingteam leadershiptechnical design reviews

Tools & Technologies

AzureGitHub ActionsLangChainLangGraphDockerKubernetesOctopusBamboodata lakesdata warehouses

Industry Keywords

IoT telemetrydata governancesecurity policiesschema evolutionbackpressurelatency SLAsAI-ready data productsRAG workflowsdocument ingestionAgile methodologies

Tech Stack

Tools & technologies

ApacheAWSAzureCloudDockerGoogle Cloud PlatformIoTKafkaKubernetesPySparkSDLCSparkVault

About the role

Key responsibilities & impact

Architect end-to-end data pipelines processing terabytes of IoT telemetry on Azure Databricks (PySpark DLT, Lakeflow) using medallion Lakehouse architecture.
Design and optimize real-time ingestion pipelines from Azure Event Hub and Apache Kafka for high-volume industrial IoT telemetry.
Build fault-tolerant, idempotent streaming architectures handling schema evolution, backpressure, and latency SLAs.
Lead architecture reviews, set engineering standards, and drive decisions on data modeling, pipeline design, and platform evolution.
Define technical direction for AI-ready data products including vector stores, embedding pipelines, and RAG-ready structured/unstructured data.
Adopt emerging LLM orchestration frameworks (LangChain, LangGraph) to accelerate GenAI platform capabilities.
Build production GenAI pipelines- RAG workflows, document ingestion, PII anonymization and vector database infrastructure.
Collaborate with data scientists and AI engineers to deliver high-quality, AI-ready datasets that improve downstream model performance.
Enforce data governance, access control, and security policies; lead PII detection and anonymization strategies across the data platform.
Champion CI/CD practices using GitHub Actions, DAB, Octopus, and Bamboo for automated, reliable pipeline delivery.
Ensure compliance with enterprise security standards within the SDLC.
Mentor engineers across seniority levels through code reviews, pairing, and technical coaching.
Translate business and AI product requirements into clear technical roadmaps and execution plans.
Partner with data scientists, product owners, and architects to align data investments with Honeywell's autonomy strategy.

Requirements

What you’ll need

8+ years of data engineering experience with at least 2 years in a lead or senior role, demonstrating progression in technical complexity and team leadership.
Hands-on experience building and operating medallion lakehouse architectures (Bronze / Silver / Gold).
Deep expertise in Apache Spark / PySpark with production experience on Azure Databricks at scale.
Strong proficiency with streaming platforms - Apache Kafka and/or Azure Event Hub for real-time IoT data.
Cloud data architecture skills (Azure preferred; AWS/GCP a plus) with experience designing scalable, cost-effective data lakes and warehouses using cloud-native services.
Data modeling and schema design expertise for both transactional and analytical workloads, including dimensional modeling and data vault methodologies.
Proven experience building data pipelines for GenAI or ML applications: RAG systems, embedding pipelines, and document ingestion.
MLOps familiarity including model versioning, feature stores, and monitoring/observability for data and ML systems.
Demonstrated ability to lead technical design reviews, mentor engineers, and drive architectural decisions with stakeholder buy-in.
Proficiency in CI/CD using GitHub Actions for automating data pipeline deployments.
Experience with LangChain, LangGraph, or other agentic AI orchestration frameworks.
Expertise in real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
Knowledge of MLOps practices and experience building data pipelines for AI model deployment
Experience with time-series databases and IoT data modeling patterns
Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
Strong background in data quality implementation for AI training data
Experience working with distributed teams and cross-functional collaboration
Knowledge of data security and governance practices for AI systems
Experience working on analytics projects with Agile and Scrum Methodologies
**US PERSON REQUIREMENTS**:
Due to compliance with U.S. export control laws and regulations, candidate must be a U.S. Person which is defined as a U.S. citizen, a U.S. permanent resident, or have protected status In the U.S. under asylum or refugee status or have the ability to obtain an export authorization.

Benefits

Comp & perks

In addition to a competitive salary, leading-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package.
This package includes employer subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.