Design, implement, and optimize ETL pipelines using Databricks and AWS S3 to support analytics, ML, BI, and automation
Build and maintain data architectures for structured and unstructured data, ensuring data quality, lineage, and security
Integrate data from multiple sources including external APIs and on-premise systems to create a unified data environment
Collaborate with Data Scientists and ML Engineers to deliver datasets and features for model training, validation, and inference
Develop and operationalize ML/GenAI pipelines, automating data preprocessing, feature engineering, model deployment, and monitoring (e.g., Databricks MLflow)
Support deployment and maintenance of GenAI models and LLMs in production environments
Provide clean, reliable data sources for reporting and dashboarding via QlikView and enable self-service BI
Partner with Automation Specialists to design and implement data-driven automated workflows using MuleSoft
Implement data governance, security, and compliance best practices and document data flows, pipelines, and architectures
Collaborate across teams (data science, BI, business, IT) to align data engineering efforts with strategic objectives
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or related field
Proven experience as a Data Engineer or similar role (3+ years)
Expertise in Databricks and AWS S3
Strong programming skills in Python (preferred for ML/automation), SQL, and/or Scala
Experience building data pipelines for analytics, ML, BI, and automation use cases
Familiarity with ML frameworks (scikit-learn, TensorFlow, PyTorch) and MLOps tools (Databricks MLflow, AWS SageMaker)
Familiarity with GenAI libraries (HuggingFace, LangChain) and LLM deployment
Experience supporting BI/reporting solutions, preferably with QlikView
Hands-on experience with automation/integration platforms such as MuleSoft is a strong plus
Understanding of data governance, security, quality, and compliance
Excellent communication, collaboration, and problem-solving skills
Nice to have: experience deploying GenAI/LLM models at scale; API development; DevOps/CI/CD for data solutions; relevant AWS, Databricks, or QlikView certifications
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.