
Data Engineer
People Data Labs
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $160,000 - $180,000 per year
Tech Stack
About the role
- Build infrastructure for ingestion, transformation, and loading an exponentially increasing volume of data from a variety of sources using Spark, SQL, AWS, and Databricks
- Building an organic entity resolution framework capable of correctly merging hundreds of billions of individual entities into a number of clean, consumable datasets.
- Developing CI/CD pipelines and anomaly detection systems capable of continuously improving the quality of data we're pushing into production.
- Dreaming up solutions to largely undefined data engineering and data science problems.
Requirements
- 4-6+ years of industry experience with clear examples of strategic technical problem-solving and implementation
- Strong software development fundamentals.
- Experience with Python
- Expertise with Apache Spark (Java, Scala, and/or Python-based)
- Experience with SQL
- Experience building scalable data processing systems (e.g., cleaning, transformation) from the ground up.
- Experience using developer-oriented data pipeline and workflow orchestration (e.g., Airflow (preferred), dbt, dagster or similar)
- Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
- Experience working in Databricks (including delta live tables, data lakehouse patterns, etc.)
- Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
- Experience with data warehousing (e.g., Databricks, Snowflake, Redshift, BigQuery, or similar)
- Understanding of modern data storage formats and tools (e.g., parquet, ORC, Avro, Delta Lake)
Benefits
- Stock
- Competitive Salaries
- Unlimited paid time off
- Medical, dental, & vision insurance
- Health, fitness, and office stipends
- The permanent ability to work wherever and however you want
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonApache SparkSQLCI/CD pipelinesdata processing systemsdata pipeline orchestrationdata warehousingdata storage formatsanomaly detectionentity resolution
Soft skills
strategic problem-solvingimplementationcommunication