Salary
💰 $106,200 - $200,000 per year
Tech Stack
Amazon RedshiftApacheAWSAzureCloudETLGoogle Cloud PlatformHadoopKafkaPythonScalaSparkSQLTerraform
About the role
- Affinity stitches together billions of data points to build a professional relationship intelligence graph.
- Role part of the AI Insights team that extracts and retrieves information from billions of structured and unstructured data points.
- Collaborate with machine learning engineers, software engineers, and product managers to shape Affinity's CRM platform.
- Design scalable and reliable data pipelines to consume, integrate and analyze large volumes of complex data from different sources.
- Help define the data roadmap and use data to shape product development.
- Build and maintain frameworks for measuring and monitoring data quality and integrity.
- Establish and optimize CI/CD processes, test frameworks, and infrastructure-as-code tooling.
- Build and implement robust data solutions using Spark, Python, Databricks, Kafka, and the AWS ecosystem (S3, Redshift, EMR, Athena, Glue).
- Identify skill and process gaps and develop processes to drive team effectiveness.
- Articulate trade-offs of different approaches to building ETL pipelines and storage solutions.
Requirements
- 5+ years of experience as a Data Engineer or Data Platform Engineer, working on complex, sometimes ambiguous engineering projects across team boundaries.
- Proficiency in data modeling, data warehousing, and ETL pipeline development is essential.
- Proven hands-on experience building scalable data platforms and reliable data pipelines using Spark and Databricks, and familiarity with Hadoop, AWS SQS, AWS Kinesis, Kafka, or similar technologies.
- Comfortable working with large datasets and high-scale data ingestion, transformation, and distributed processing tools such as Apache Spark (Scala or Python).
- Strong proficiency in SQL.
- Familiar with industry-standard databases and analytics technologies, including Data Warehousing and Data Lakes.
- Experience with cloud platforms such as AWS, Databricks, GCP, Azure or related technologies.
- Familiar with CI/CD processes and test frameworks.
- Comfortable partnering with product and machine learning teams on large, strategic data projects.
- Nice to have: Hands-on experience with both relational and non-relational database/data stores, including vector databases (e.g. Weaviate, Milvus), graph databases, and text search engines (e.g. OpenSearch or Vespa clusters), with a focus on indexing and query optimization.
- Nice to have: Experience with Infrastructure as Code (IaC) tools, such as Terraform.
- Nice to have: Experience implementing data consistency measures using validation and monitoring tools.
- Please include your favorite programming language at the very end of your resume, outside of your skills section, with the word '#filter' next to it.