
Machine Learning Data Engineer
Kalibri Labs
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $120,000 - $160,000 per year
About the role
- Design, build, and maintain production data pipelines using Python, Prefect, Airflow, Jenkins or any other orchestration framework multi-phase algorithmic workflows.
- Build and optimize advanced SQL transformations in Snowflake, including window functions, CTEs, stored procedures, UDFs, and semi-structured data processing.
- Build and maintain dbt models for data transformation, identity resolution, and slowly changing dimension (SCD Type 2) tracking across 80+ models and multiple pipeline stages.
- Build and maintain feature engineering pipelines that feed ML models including CatBoost gradient boosting, Prophet time-series decomposition, LightGBM regression, and PuLP linear programming solvers.
- Operationalize ML model outputs by integrating predicted ADRs, occupancy forecasts, and optimization results into downstream production tables and Parquet file outputs.
- Integrate and reconcile data from multiple heterogeneous sources including hotel property management systems, rate shop providers, mapping APIs, and market forecast data.
- Work with PySpark for large-scale daily distribution processing, managing partitioning strategies, memory tuning, and efficient Parquet I/O across millions of records.
- Implement and monitor data quality frameworks such as DBT and Monte Carlo.
- Manage CI/CD pipelines using Bitbucket Pipelines for automated testing, linting (SQLFluff), and deployment of dbt projects and Python applications.
- Containerize pipeline components with Docker for consistent execution across development and production environments.
- Implement robust retry logic, error handling, and fallback strategies across pipeline phases to ensure reliable daily and monthly production runs.
Requirements
- Master's degree or PhD in Computer Science, Data Science, Statistics, Mathematics, or a related quantitative field (or Bachelor's degree with equivalent experience).
- 3–5 years of professional experience as an ML Engineer, Quantitative Engineer, or Research Scientist.
- Strong proficiency in Python for data pipeline development, scripting, and automation.
- Deep experience with SQL and cloud data warehouses, particularly Snowflake (stored procedures, UDFs, semi-structured data, performance tuning).
- Hands-on experience with workflow orchestration tools such as Prefect, Airflow, or similar (e.g., Dagster, Luigi).
- Proficiency with dbt (dbt Core or dbt Cloud) for SQL-based data transformation and testing.
- Experience working with PySpark or similar distributed computing frameworks for large-scale data processing.
- Strong understanding of data modeling, ETL/ELT patterns, and data warehouse design principles.
- Proficiency with Git version control and collaborative development workflows (Bitbucket preferred).
- Demonstrated ability to operationalize ML models — not just train them — including feature pipelines, model serving, and output validation.
- Excellent cross-functional collaboration skills with proven ability to work alongside data scientists, analysts, and product managers.
Benefits
- Fully remote work, with a thriving company culture
- Robust medical, dental, and vision plans through Blue Cross Blue Shield, including a $0 cost plan for employees and subsidized coverage for dependents
- 401k plan with employer match
- Flexible Paid Time Off
- $250 new hire allowance for home office setup
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonSQLdbtPySparkCatBoostProphetLightGBMMonte CarloDockerETL/ELT
Soft Skills
cross-functional collaborationproblem-solvingcommunicationerror handlingreliabilityautomationdata quality monitoringfeature engineeringdata modelingperformance tuning
Certifications
Master's degreePhDBachelor's degree