Kalibri Labs

Machine Learning Data Engineer

Kalibri Labs

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $120,000 - $160,000 per year

About the role

  • Design, build, and maintain production data pipelines using Python, Prefect, Airflow, Jenkins or any other orchestration framework multi-phase algorithmic workflows.
  • Build and optimize advanced SQL transformations in Snowflake, including window functions, CTEs, stored procedures, UDFs, and semi-structured data processing.
  • Build and maintain dbt models for data transformation, identity resolution, and slowly changing dimension (SCD Type 2) tracking across 80+ models and multiple pipeline stages.
  • Build and maintain feature engineering pipelines that feed ML models including CatBoost gradient boosting, Prophet time-series decomposition, LightGBM regression, and PuLP linear programming solvers.
  • Operationalize ML model outputs by integrating predicted ADRs, occupancy forecasts, and optimization results into downstream production tables and Parquet file outputs.
  • Integrate and reconcile data from multiple heterogeneous sources including hotel property management systems, rate shop providers, mapping APIs, and market forecast data.
  • Work with PySpark for large-scale daily distribution processing, managing partitioning strategies, memory tuning, and efficient Parquet I/O across millions of records.
  • Implement and monitor data quality frameworks such as DBT and Monte Carlo.
  • Manage CI/CD pipelines using Bitbucket Pipelines for automated testing, linting (SQLFluff), and deployment of dbt projects and Python applications.
  • Containerize pipeline components with Docker for consistent execution across development and production environments.
  • Implement robust retry logic, error handling, and fallback strategies across pipeline phases to ensure reliable daily and monthly production runs.

Requirements

  • Master's degree or PhD in Computer Science, Data Science, Statistics, Mathematics, or a related quantitative field (or Bachelor's degree with equivalent experience).
  • 3–5 years of professional experience as an ML Engineer, Quantitative Engineer, or Research Scientist.
  • Strong proficiency in Python for data pipeline development, scripting, and automation.
  • Deep experience with SQL and cloud data warehouses, particularly Snowflake (stored procedures, UDFs, semi-structured data, performance tuning).
  • Hands-on experience with workflow orchestration tools such as Prefect, Airflow, or similar (e.g., Dagster, Luigi).
  • Proficiency with dbt (dbt Core or dbt Cloud) for SQL-based data transformation and testing.
  • Experience working with PySpark or similar distributed computing frameworks for large-scale data processing.
  • Strong understanding of data modeling, ETL/ELT patterns, and data warehouse design principles.
  • Proficiency with Git version control and collaborative development workflows (Bitbucket preferred).
  • Demonstrated ability to operationalize ML models — not just train them — including feature pipelines, model serving, and output validation.
  • Excellent cross-functional collaboration skills with proven ability to work alongside data scientists, analysts, and product managers.
Benefits
  • Fully remote work, with a thriving company culture
  • Robust medical, dental, and vision plans through Blue Cross Blue Shield, including a $0 cost plan for employees and subsidized coverage for dependents
  • 401k plan with employer match
  • Flexible Paid Time Off
  • $250 new hire allowance for home office setup
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonSQLdbtPySparkCatBoostProphetLightGBMMonte CarloDockerETL/ELT
Soft Skills
cross-functional collaborationproblem-solvingcommunicationerror handlingreliabilityautomationdata quality monitoringfeature engineeringdata modelingperformance tuning
Certifications
Master's degreePhDBachelor's degree