Design, implement, and evaluate RL algorithms suited to long-horizon, sparse-reward healthcare decisioning, including policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning), and offline RL methods (CQL, Decision Transformer).
Define and maintain the member state representation and action space, evolving both as new programs and data sources are onboarded.
Apply the Bellman equation, reward shaping, and constraint mapping to encode clinical eligibility, suppression rules, and program-specific objectives directly into the learning objective.
Manage exploration-exploitation tradeoffs appropriate for a production healthcare environment where poorly explored actions have real member impact.
Build simulation and backtesting environments to evaluate policy quality before production promotion, using historical member journey data.
Diagnose and remediate common RL failure modes: policy collapse, credit assignment errors across long member journeys, and distributional shift between training and serving populations.
Own the nightly Databricks training workflow: feature engineering from Gold Activity History and Gold Patient Profile, state vector normalization, distributed RL training via Ray RLlib, and batch scoring of all 8M eligible members.
Collaborate with the Data Engineering team (Decisioning Team 2) to ensure training inputs are correctly joined, reward signals are accurately computed from disposition outcomes, and the feature pipeline is reproducible and auditable.
Write production-quality PySpark feature engineering jobs; maintain data lineage through Databricks Unity Catalog.
Manage model artifacts, versioning, and lifecycle in the MLflow Model Registry; ensure rollback capability is maintained at all times.
Apply multi-agent RL concepts (MARL via PettingZoo) where member household or population-level coordination is required.
Implement constraint mapping to enforce hard business rules — member caps, cooldown periods, clinical eligibility — as constraints within the RL objective rather than downstream filters.
Collaborate with the Rules Engine team to ensure Drools eligibility guards and RL policy priorities are correctly aligned and do not conflict.
Partner with Decisioning Team 1 (Decision Engine, Rules Engine) to ensure model outputs integrate cleanly with the real-time decisioning hot path and that scored recommendations cached in Redis are correctly structured and interpreted.
Document model behavior, known limitations, and failure modes for clinical and compliance stakeholders; support explainability requirements for member-facing decisions.

Requirements

8+ years of software engineering experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, or optimization engines serving millions of users.
3+ years of hands-on experience implementing reinforcement learning or deep learning systems in production policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning), or offline RL algorithms (CQL, Decision Transformer).
Deep familiarity with the Bellman equation, reward shaping, exploration-exploitation tradeoff, and constraint mapping in real-world RL systems.
Demonstrated ability to diagnose RL-specific failure modes: policy collapse, credit assignment issues, and distributional shift across large populations.
Proficiency in Python 3.x; experience with PyTorch or TensorFlow for policy network implementation.
Experience with Ray RLlib for distributed RL training at scale.
Experience with Databricks, PySpark, and Delta Lake for large-scale ML pipelines processing tens of millions of records.
Experience with MLflow for experiment tracking, model registry, and artifact management.
Track record of shipping ML systems that operate reliably under production load — not just research or prototype work.

Benefits

medical, dental and vision benefits
401(k) retirement savings plan
time off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave)
short-term and long-term disability
life insurance and many other opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

reinforcement learningpolicy gradient methodsvalue-based approachesoffline RL algorithmsBellman equationreward shapingexploration-exploitation tradeoffconstraint mappingPython 3.xPySpark

Soft Skills

collaborationdiagnostic abilityproblem-solvingdocumentationexplainability