Salary
💰 $70,500 - $200,200 per year
Tech Stack
AirflowApacheAWSAzureCloudDistributed SystemsDockerKafkaKubernetesPythonSparkSQLTerraform
About the role
- Design and implement comprehensive Lakehouse architecture solutions using technologies like Databricks, Snowflake, or equivalent platforms
- Build and maintain real-time and batch data processing systems using Apache Spark, Kafka, and similar technologies
- Architect scalable data pipelines that handle structured, semi-structured, and unstructured data to deliver AI ready data
- Develop data transformation workflows using tools like DBT, Airflow, or Databricks
- Lead the technical strategy for data lake and data warehouse integration, ensuring optimal performance and cost efficiency
- Implement data governance frameworks, including data quality monitoring, lineage tracking, data time travel and security protocols
- Implement centralized data catalog system and enhance data discovery using technologies like Elastic Search / Open Search
- Establish monitoring and alerting systems for data pipeline health using technologies like Apache Superset
- Drive adoption of modern data engineering best practices including Infrastructure as Code, CI/CD, and automated testing
- Collaborate with data scientists, analysts, and business stakeholders to translate requirements into robust technical solutions
- Mentor a team of 3-5 data engineers
- Foster a collaborative team culture focused on continuous learning and innovation
Requirements
- Master’s degree in computer science, Engineering, or related technical field
- 3+ years of hands-on experience with Lakehouse architectures (Databricks, Snowflake, or similar)
- 7+ years of overall data engineering experience with large-scale distributed systems
- Experience with streaming data technologies (Kafka)
- Familiarity with data cataloging tools (Apache Atlas or DataHub)
- Familiarity with high performance data service framework (Arrow Flight)
- Industry certifications in cloud platforms or big data technologies
- Expert-level proficiency in Python and SQL for data transformation and pipeline development
- Strong experience with Apache Spark for big data processing and analytics
- Hands-on experience with cloud platforms (AWS or Azure) and their data services
- Proficiency with Infrastructure as Code tools (Terraform, CloudFormation)
- Experience with containerization (Docker, Kubernetes) and orchestration platforms
- Knowledge of data modeling techniques for both analytical and operational workloads
- Understanding of data governance, security, and compliance requirements
- Knowledge in the pharmaceutical or life sciences domain
- Health insurance
- 401(k)
- Pension
- Vacation benefits
- Eligibility for medical, dental, vision and prescription drug benefits
- Flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
- Life insurance and death benefits
- Certain time off and leave of absence benefits
- Well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Lakehouse architectureDatabricksSnowflakeApache SparkKafkaDBTAirflowPythonSQLTerraform
Soft skills
leadershipcollaborationmentoringcommunicationteam culturecontinuous learninginnovation
Certifications
cloud platform certificationsbig data technology certifications