Rula

Staff Data Engineer, AI

Rula

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Lead

Tech Stack

AirflowAmazon RedshiftAWSAzureBigQueryCloudETLGoogle Cloud PlatformPythonRaySparkSQLTerraform

About the role

  • We believe that mental health is just as important as physical health. We recognize that mental health issues can be complex and multifaceted, and we are dedicated to treating the whole person, not just the symptoms.
  • We aim to create a world where mental health is no longer stigmatized or marginalized, but rather is embraced as an integral part of one's overall well-being.
  • We believe that by providing quality care that is both evidence-based and compassionate, we can empower individuals to take charge of their mental health and achieve their full potential. We are passionate about making a positive impact on the lives of those struggling with mental health issues and we strive to be a force for positive change in the field of mental healthcare.
  • About the Role: We’re shaping the future of mental health care with AI-enabled experiences that enhance, not replace, the human connection at the core of therapy. Our north star is clinically-grounded and responsible AI designed to bring greater transparency, personalization, and continuous support across the therapy journey. Our work transforms therapy into an experience that’s more connected and accessible. As we expand our portfolio of AI experiences, we’re scaling our team to drive innovation and set a new standard for mental health care.
  • As a Data Engineer, you will help build and maintain the data pipelines that pull information from our central storage system to train machine learning models and AI tools, supporting a variety of use cases that support our providers and improve patient outcomes. You will be part of a collaborative group that values open discussions and quick adjustments to meet changing needs, working alongside data experts and other specialists to turn raw information into useful resources for our mission. This role sits within our data team, which is part of the overall engineering organization and is a close partner team to our ML Team, where your daily work—designing reliable flows of information, testing for accuracy, and solving unexpected challenges—will directly support innovations that help more individuals get the mental health support they deserve. If you enjoy turning complex data into something that makes a real difference in people's lives, this is your chance to contribute to meaningful advancements in health care.

Requirements

  • 8+ years of Data Pipeline Development – specifically building and maintaining scalable ETL/ELT pipelines for ML/AI training workflows, using tools like AWS Glue, DBT, Dagster, Spark, or Ray for distributed processing of large-scale structured and unstructured data from Data Lakes. Strong proficiency in Spark, Python, and SQL for feature engineering, data transformation, and ensuring high-quality, versioned datasets suitable for model training and inference.
  • 8+ Years of Cloud Infrastructure & Data Warehousing experience, 4+ of which with a focus in AWS. This person should be proficient in AWS services such as Redshift, S3, Glue, IAM, EMR, and SageMaker for supporting ML/AI pipelines. Candidates may bring additional experience from other cloud environments (e.g., GCP services like BigQuery, GCS, Dataflow, or AI Platform; Azure services like Synapse Analytics, Blob Storage, Databricks, or Machine Learning Studio) to complement their AWS expertise. Experience optimizing data warehouses (e.g., Redshift, Snowflake, BigQuery) and managing data lakes (e.g., S3, GCS, Azure Blob) for large-scale, versioned ML training datasets, with a focus on partitioning, access controls, and integration with distributed processing frameworks like Spark.
  • Implementing scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection, anomaly identification, and dataset integrity to ensure high-fidelity training data. Familiarity with data governance practices, such as metadata management, lineage tracking for reproducible models, and compliance with regulations like CPRA or HIPAA in Data Lake environments.
  • Optimizing data pipelines, queries, and managing large datasets for efficiency and scalability. Knowledge of best practices for high-throughput systems.
  • Experience with data security measures (encryption, role-based access control, data masking). Understanding of compliance standards (e.g., HIPAA, SOC 2) and their application in data engineering.
  • Strong ability to work cross-functionally with data analysts, data scientists, and stakeholders. Effective communication skills to explain technical concepts to non-technical audiences. Adaptability to thrive in a fast-paced startup environment.