
Mid Data Engineer
Lean Tech
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
About the role
- Design, develop, and manage robust dimensional models for master and reference data within the Databricks lakehouse, utilizing SQL notebooks and views to ensure data integrity.
- Build and optimize robust batch and streaming data pipelines to integrate data into the data lakehouse, ensuring reliable and efficient data flows for master data model consumption.
- Develop and manage the transport of master data from the Databricks lakehouse to the AWS RDS (PostgreSQL) data hub, building and extending REST API endpoints to ensure seamless data consumption by internal applications.
- Implement and enforce data quality standards through rigorous modeling discipline and governance processes, partnering with data stewards to research and resolve data exceptions.
- Conduct comprehensive source system analysis, data profiling, and data mapping to inform data modeling and integration strategies.
- Collaborate with cross-functional teams to develop an in-depth understanding of master data and align on data strategies, processes, policies, and controls.
- Share master data management best practices and provide training and recommendations to data stewards and other stakeholders across the organization.
- Create and maintain comprehensive documentation for master data flows, lineage, and standards to enhance organizational data literacy.
Requirements
- A minimum of 4 years of professional experience in data engineering, with at least 2 years specifically focused on master and reference data.
- Advanced expertise in Master Data Management (MDM) principles and best practices, with proven experience designing and implementing robust dimensional data models and architectural MDM solutions.
- Proficiency in a SQL-first development environment, with expert-level SQL skills for data modeling and transformations.
- Hands-on experience with modern cloud data platforms, particularly Databricks (for SQL notebooks, views, and dimensional modeling) and AWS RDS for PostgreSQL.
- Experience building and optimizing robust batch and streaming data pipelines.
- Proficiency in building and extending REST API endpoints on a PostgreSQL database to facilitate data synchronization and consumption by internal applications.
- Working knowledge of Python and Spark, utilized selectively for workloads where they are a better fit than SQL.
- Demonstrated ability in source system analysis, data profiling, and data mapping to ensure data integrity.
- Working knowledge of Agile/Scrum methodologies and familiarity with lifecycle management tools such as Jira and Confluence.
- Familiarity with software engineering best practices, including code reviews and CI/CD workflows (e.g., GitHub), and experience collaborating with platform teams on deployments.
- Strong business acumen, analytical skills, and intellectual curiosity with a high attention to detail.
Benefits
- Collaborative work environment
- Professional development opportunities with international customers
- Career path and mentorship programs that will lead to new levels.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
SQLMaster Data Management (MDM)dimensional modelingdata modelingdata transformationsbatch data pipelinesstreaming data pipelinesREST APIPostgreSQLPython
Soft skills
analytical skillsattention to detailbusiness acumenintellectual curiositycollaborationtrainingdata governanceproblem-solvingcommunicationcross-functional teamwork