Validate AI and ML-powered healthcare solutions across the full development lifecycle to ensure data quality, model performance, reliability, and safe deployment in production environments
Design and execute data-driven and automated test strategies, including model evaluation, prompt regression testing, dataset profiling, and end-to-end pipeline validation
Partner with data science, engineering, product, and security teams to define measurable quality gates and deliver compliant, explainable, and dependable AI experiences that drive client value
Develop and execute test strategies for ML and generative AI-powered applications
Design and maintain evaluation frameworks for Large Language Models (LLM), including automated scoring and LLM -as-a-judge methodologies
Develop prompt regression test suites to detect performance degradation across model and prompt versions
Evaluate generative AI systems for hallucination risk, factual consistency, grounding accuracy, and safety compliance
Conduct model evaluation, regression testing, and drift monitoring in development and production environments
Build dashboards and monitoring tools to detect degraded evaluation scores, drift, or safety risks and support proactive triage
Design and implement proactive AI-driven alerting and recommendation systems embedded within dashboards and user workflows
Automate dashboard metric generation and refresh pipelines using Python and data workflows
Partner with cross-functional teams to define AI quality standards, acceptance criteria, and release gates
Investigate defects, analyze root causes, and recommend corrective actions to improve reliability and performance

Requirements

Relevant degree preferred
2 or more years of relevant experience required
Experience validating ML or generative AI-based applications, including model evaluation and data quality assessment required
Proficiency in Python, SQL, and test automation frameworks
Experience evaluating LLM systems, including prompt regression testing and automated or human-in-the-loop judging methodologies
Familiarity with RAG evaluation concepts, including retrieval quality, context relevance, faithfulness, and safety testing
Experience designing AI evaluation metrics, including ranking, calibration, and reliability measures
Experience building model monitoring dashboards and production health reporting
Understanding of Agile methodologies and CI/CD practices
Strong analytical, documentation, and communication skills
Self-starter who thrives in fast-paced, iterative environment and drives quality initiatives end-to-end amid ambiguity and shifting priorities.

Benefits

Comprehensive benefits plan

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonSQLtest automation frameworksmodel evaluationdata quality assessmentprompt regression testingAI evaluation metricsmonitoring dashboardsCI/CD practicesAgile methodologies

Soft Skills

analytical skillsdocumentation skillscommunication skillsself-starteradaptabilityquality initiativesproblem-solvingcollaborationattention to detailtime management