Chalice AI

Senior Machine Learning Engineer

Chalice AI

full-time

Posted on:

Location Type: Hybrid

Location: New York City • New York • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $180,000 - $200,000 per year

Job Level

Senior

Tech Stack

AWSCloudEC2GrafanaPrometheusPySparkPythonPyTorchRayUnity

About the role

  • Architect, train, and maintain scalable neural network systems for audience modeling and bid optimization using PyTorch and Ray distributed training (Ray Train, Ray Tune, DDP)
  • Build and optimize multi-GPU training pipelines on Databricks, including hyperparameter search with ASHA scheduling and early stopping
  • Develop feature engineering pipelines using PySpark, including embedding layers (EmbeddingBag, Embedding) for categorical and behavioral features
  • Implement model comparison workflows with champion/challenger evaluation on holdout data
  • Build resilient training and batch inference workflows with a focus on automation, reproducibility, and checkpoint recovery
  • Implement robust model monitoring and observability solutions (MLflow, Prometheus, Grafana, Datadog) to track drift, performance metrics (AUC, AUPRC, F1), and system health
  • Manage model versioning, experiment tracking, and artifact persistence using MLflow and Unity Catalog
  • Work closely with engineering teams to integrate model outputs into production systems and optimize dataflows for fault-tolerance
  • Partner with product stakeholders to align ML efforts with business impact, KPIs, and product strategy across AI Audiences, AI Allocator, CPA Algo, and Curate AI
  • Lead technical design reviews, contribute to internal Python packages, and enforce engineering best practices (testing, CI/CD, modularity)
  • Stay current on ML infrastructure advancements (distributed training, inference optimization, model serving patterns) and help guide adoption internally
  • Document system architectures, create runbooks, and enable team members to adopt and extend the ML framework

Requirements

  • Master's Degree or PhD in Computer Science, Statistics, Machine Learning, or related discipline with 5-10 years of industry experience
  • Strong proficiency in PyTorch for neural network development, including custom architectures with embedding layers, MLP backbones, and binary classification heads
  • Production experience with Databricks including Delta Lake, Unity Catalog, Asset Bundles, and cluster management
  • Strong grasp of MLOps best practices: experiment tracking (MLflow), model versioning, model serving, monitoring, and reproducibility
  • Expert-level Python and PySpark skills for data processing and feature engineering at scale
  • Experience building and maintaining batch inference pipelines with schema versioning and artifact management
  • Familiarity with cloud platforms (AWS: S3, EC2) and data warehousing (Snowflake)
  • Experience with CI/CD workflows including build automation, testing, and packaging using GitHub Actions and Make
  • Excellent collaboration and communication skills; ability to work effectively in a cross-functional environment with DS, Product, and Engineering teams.
Benefits
  • Medical, Dental, and Vision coverage
  • 401(k) options
  • Unlimited PTO
  • 11 Company Holidays
  • Office-wide closure between Christmas Eve and New Year's

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
neural network systemsPyTorchRay distributed trainingmulti-GPU training pipelinesfeature engineeringPySparkmodel monitoringexperiment trackingmodel versioningPython
Soft skills
collaborationcommunicationleadershipcross-functional teamworktechnical design reviews
Certifications
Master's DegreePhD