Salary
💰 $162,000 - $190,000 per year
Tech Stack
AWSCloudETLKafkaMongoDBNoSQLPySparkPythonSpark
About the role
- Create and maintain data pipelines and foundational datasets to support product/business needs
- Design and build database architectures with massive and complex data, balancing with computational load and cost
- Develop audits for data quality at scale, implementing alerting as necessary
- Create scalable dashboards and reports to support business objectives and enable data-driven decision-making
- Troubleshoot and resolve complex issues in production environments
- Work closely with product managers and other stakeholders to define and implement new features
- Improve data ingestion and processing capabilities and ensure the reliability and accuracy of Checkr's background check services
Requirements
- 7+ years of development experience in the field of data engineering (5+ years writing PySpark)
- Experience building large-scale (100s of Terabytes and Petabytes) data processing pipelines - batch and stream
- Experience with ETL/ELT, stream and batch processing of data at scale
- Strong proficiency in PySpark and Python
- Expertise in understanding of database systems, data modeling, relational databases, NoSQL (such as MongoDB)
- Experience with big data technologies such as Kafka, Spark, Iceberg, Datalake and AWS stack (EKS, EMR, Serverless, Glue, Athena, S3, etc.)
- Knowledge of security best practices and data privacy concerns
- Strong problem-solving skills and attention to detail
- Nice to have: Experience/knowledge of data processing platforms such as Databricks or Snowflake
- Nice to have: An understanding of Graph and Vector data stores