Data Engineer

• Lead development of data pipelines and workflow for large scale ML models at Reddit.
• Design and implement scalable and secure data processing pipelines and storage environments that prepare our source of truth datasets for our models.
• Ensure data is cleansed, mapped, transformed, and otherwise optimized for storage and use according to business and technical requirements.
• Build effective data pipelines and workflows to streamline data ingestion, processing, and distribution tasks.
• Setting up and operating data workflow management tools for SQL code versioning, dependency tracing, etc
• Load transformed data into storage and reporting structures in destinations including data warehouse, reporting systems and analytics applications.
• Monitor and troubleshoot issues with the data environment to maintain high availability and performance.
• Support monitoring and observability across training datasets, model metrics and implement diagnostic tools for metric movements.
• Maintain effective documentation regarding data procedures, systems, and architectures to maintain clarity and enable easy collaboration.

Senior Data Engineer, ML Platform

Salary

Job Level

Tech Stack

About the role

Requirements