Reddit, Inc.

Senior Data Engineer, ML Platform

Reddit, Inc.

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $190,800 - $267,100 per year

Job Level

Senior

Tech Stack

AirflowKafkaSparkSQL

About the role

  • Lead development of data pipelines and workflow for large scale ML models at Reddit.
  • Design and implement scalable and secure data processing pipelines and storage environments that prepare our source of truth datasets for our models.
  • Ensure data is cleansed, mapped, transformed, and otherwise optimized for storage and use according to business and technical requirements.
  • Build effective data pipelines and workflows to streamline data ingestion, processing, and distribution tasks.
  • Setting up and operating data workflow management tools for SQL code versioning, dependency tracing, etc
  • Load transformed data into storage and reporting structures in destinations including data warehouse, reporting systems and analytics applications.
  • Monitor and troubleshoot issues with the data environment to maintain high availability and performance.
  • Support monitoring and observability across training datasets, model metrics and implement diagnostic tools for metric movements.
  • Maintain effective documentation regarding data procedures, systems, and architectures to maintain clarity and enable easy collaboration.

Requirements

  • 5+ years of experience in Data Engineering or ML Infrastructure
  • Experience with large scale data transforms to prepare graph data
  • Experience with Graph DB, Spark, Kafka pipelines
  • Experience working with Airflow and MLFlow
  • Experience with storage frameworks like BQ, parquet, iceberg
  • Awareness of ML models and architectures is a huge plus.
  • Strong focus on scalability, reliability, performance, and ease of use.
  • Strong organizational & communication skills