Design, build, and maintain production data pipelines using Python, Prefect, Airflow, Jenkins or any other orchestration framework multi-phase algorithmic workflows.
Build and optimize advanced SQL transformations in Snowflake, including window functions, CTEs, stored procedures, UDFs, and semi-structured data processing.
Build and maintain dbt models for data transformation, identity resolution, and slowly changing dimension (SCD Type 2) tracking across 80+ models and multiple pipeline stages.
Build and maintain feature engineering pipelines that feed ML models including CatBoost gradient boosting, Prophet time-series decomposition, LightGBM regression, and PuLP linear programming solvers.
Operationalize ML model outputs by integrating predicted ADRs, occupancy forecasts, and optimization results into downstream production tables and Parquet file outputs.
Integrate and reconcile data from multiple heterogeneous sources including hotel property management systems, rate shop providers, mapping APIs, and market forecast data.
Work with PySpark for large-scale daily distribution processing, managing partitioning strategies, memory tuning, and efficient Parquet I/O across millions of records.
Implement and monitor data quality frameworks such as DBT and Monte Carlo.
Manage CI/CD pipelines using Bitbucket Pipelines for automated testing, linting (SQLFluff), and deployment of dbt projects and Python applications.
Containerize pipeline components with Docker for consistent execution across development and production environments.
Implement robust retry logic, error handling, and fallback strategies across pipeline phases to ensure reliable daily and monthly production runs.

Requirements

Master's degree or PhD in Computer Science, Data Science, Statistics, Mathematics, or a related quantitative field (or Bachelor's degree with equivalent experience).
3–5 years of professional experience as an ML Engineer, Quantitative Engineer, or Research Scientist.
Strong proficiency in Python for data pipeline development, scripting, and automation.
Deep experience with SQL and cloud data warehouses, particularly Snowflake (stored procedures, UDFs, semi-structured data, performance tuning).
Hands-on experience with workflow orchestration tools such as Prefect, Airflow, or similar (e.g., Dagster, Luigi).
Proficiency with dbt (dbt Core or dbt Cloud) for SQL-based data transformation and testing.
Experience working with PySpark or similar distributed computing frameworks for large-scale data processing.
Strong understanding of data modeling, ETL/ELT patterns, and data warehouse design principles.
Proficiency with Git version control and collaborative development workflows (Bitbucket preferred).
Demonstrated ability to operationalize ML models — not just train them — including feature pipelines, model serving, and output validation.
Excellent cross-functional collaboration skills with proven ability to work alongside data scientists, analysts, and product managers.

Benefits

Fully remote work, with a thriving company culture
Robust medical, dental, and vision plans through Blue Cross Blue Shield, including a $0 cost plan for employees and subsidized coverage for dependents
401k plan with employer match
Flexible Paid Time Off
$250 new hire allowance for home office setup

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonSQLdbtPySparkCatBoostProphetLightGBMMonte CarloDockerETL/ELT

Soft Skills

cross-functional collaborationproblem-solvingcommunicationerror handlingreliabilityautomationdata quality monitoringfeature engineeringdata modelingperformance tuning

Certifications

Master's degreePhDBachelor's degree