Support the Agentforce baselining program, using synthetic and automated tooling to continuously measure and improve performance.
Analyze evaluation results independently, identifying root causes, surfacing trends, and translating insights into actionable recommendations for models, implementations, and processes.
Maintain and evolve evaluation frameworks, scoring rubrics, and guidelines to ensure consistent, defensible, and scalable assessments.
Deliver clear, influential reporting and business reviews that inform stakeholders and drive product and operational decisions.
Define, monitor, and interpret key evaluation metrics, proactively identifying risks, regressions, and improvement opportunities.
Enable internal partners on evaluation processes and findings, building trust and shared understanding across teams.
Strengthen the evaluation feedback loop across automated testing, LLM-judge prompts, and golden datasets to continuously improve testing sophistication.
Perform targeted evaluations for new features and urgent initiatives, ensuring quality and market readiness.
Audit and refine the utterance repository to keep testing relevant, high quality, and aligned with evolving product capabilities.
Synthesize customer and internal feedback into actionable insights, helping shape product direction and operational improvements.
Advocate for tooling, process, and workflow improvements that increase evaluation efficiency, scalability, and reliability.
Proactively surface risks and partner on mitigations, ensuring issues are addressed before they impact customers.

Requirements

1+ years of professional experience working in Salesforce environments (program, analyst, operations, or product context).
Demonstrated ability to take ownership of tasks and drive outcomes independently.
Strong analytical mindset: comfortable reviewing conversational AI outputs, identifying failure patterns, conducting root cause analysis, and translating findings into actionable recommendations.
Operational rigor and attention to detail: able to execute repeatable evaluation workflows accurately and consistently in a fast-paced, ambiguous environment.
Clear written communication skills: able to document findings, produce internal documentation, and communicate insights concisely for cross-functional audiences.
Comfort working with data: proficiency in spreadsheets (e.g., Google Sheets), reporting, and basic dashboard interpretation to derive insights and track trends.
High reading comprehension and critical thinking: able to evaluate nuanced generative AI responses against quality standards and expected behaviors.
Tool fluency: ability to work confidently in Salesforce reporting environments (Agentforce, Tableau, Testing Center, Observability) or quickly ramp on similar tools.
Curiosity and learning agility: resourceful in exploring new tools, understanding evolving AI behaviors, and continuously improving evaluation approaches.
Execution reliability: responsive, accountable, and dependable in delivering accurate outputs and supporting operational needs.

Benefits

time off programs
medical
dental
vision
mental health support
paid parental leave
life and disability insurance
401(k)
employee stock purchasing program

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

root cause analysisdata analysisevaluation frameworksscoring rubricskey evaluation metricsautomated testingconversational AIdashboard interpretationreportingquality standards

Soft Skills

analytical mindsetclear written communicationattention to detailcuriositylearning agilityexecution reliabilityownershiptrust buildingcritical thinkinginfluential reporting