Design and implement a comprehensive data lake architecture using modern cloud-native technologies
Build scalable ETL/ELT pipelines for real-time and batch data processing across all data sources
Establish data ingestion frameworks to collect data from application APIs, third-party services, and databases
Architect automated data quality monitoring, validation, and alerting systems
Create robust data warehousing solutions optimized for analytics and business intelligence
Implement DataOps practices with automated testing and deployment pipelines (CI/CD for data)
Develop and maintain Python-based data processing frameworks and utilities
Build an automated data pipeline orchestration using Apache Airflow or similar tools
Create streaming data processing solutions using Apache Kafka, Kinesis, or Pub/Sub
Implement infrastructure as code for all data platform components (Terraform, CloudFormation)
Establish feature stores and data models that support both operational and analytical workloads
Optimize data storage costs and query performance across the entire platform
Collaborate with product and business teams to identify key metrics, KPIs, and analytical requirements
Build automated reporting dashboards and self-service business intelligence tools
Support predictive modeling initiatives and A/B testing frameworks
Requirements
Minimum 5+ years of hands-on data engineering experience with increasing responsibility
Preferred 7+ years in data engineering, analytics engineering, or data platform roles
Proven track record of building data systems from scratch or leading data infrastructure transformations
Experience working as a solo data engineer or in small, autonomous data teams
Python proficiency required - demonstrated experience building data pipelines, ETL frameworks, and automation scripts
SQL expertise - advanced knowledge of complex queries, performance optimization, and data modeling
Strong experience with cloud platforms (AWS, GCP, or Azure) and their data services
Proficiency with data lake technologies (Delta Lake, Apache Iceberg, or Apache Hudi)
Experience with data orchestration tools (Apache Airflow, Prefect, Dagster, or similar)
Knowledge of streaming data technologies (Apache Kafka, Kinesis, Pub/Sub)
Familiarity with data warehouse technologies (Snowflake, BigQuery, Redshift, Databricks)
Understanding of containerization (Docker, Kubernetes) and infrastructure as code
Experience with version control systems (Git) and collaborative development workflows
Benefits
Compensation includes a highly competitive salary, generous equity, medical, dental, and vision coverage paid 100% by the company, 401K benefits, and other travel-related perks
$500 Annual Experience Stipend (Can be used at any of our client partners)
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
data engineeringETLELTPythonSQLdata modelingdata quality monitoringDataOpsautomated testingpredictive modeling