About the role
- Own the test strategy across FE, BE, and LLM/agent workflows; define the test pyramid and coverage goals.
- Build and maintain automation for:
- UI/E2E flows (critical paths, multi‑step agentic scenarios).
- API/contract testing (REST/GraphQL), schema/compatibility checks.
- Data & integration tests across services and external partners.
- LLM Quality & Evals
- Design golden datasets and rubrics for LLM output evaluation (pass/fail thresholds, human‑in‑the‑loop review).
- Set up regression/eval harnesses for prompts, tools, and agents; track drift and hallucination rates.
- Implement guardrails and adversarial/red‑team tests for safety, privacy, bias, and toxicity.
- Non‑functional testing: baseline and monitor performance, reliability, and resilience for key user journeys.
- CI/CD ownership: integrate tests into pipelines, gate releases, reduce flakiness, and surface quality KPIs (e.g., failure rate, MTTR, coverage).
- Test data & environments: manage fixtures, synthetic data, and stable seeds for deterministic runs.
- Quality operations: triage defects, drive root‑cause analysis, and keep crisp, proactive communication with PM, Design, and Engineering.
- Documentation: maintain clear test plans, playbooks, and release criteria.
Requirements
- Own the test strategy across FE, BE, and LLM/agent workflows; define the test pyramid and coverage goals.
- Build and maintain automation for:
- UI/E2E flows (critical paths, multi‑step agentic scenarios).
- API/contract testing (REST/GraphQL), schema/compatibility checks.
- Data & integration tests across services and external partners.
- LLM Quality & Evals
- Design golden datasets and rubrics for LLM output evaluation (pass/fail thresholds, human‑in‑the‑loop review).
- Set up regression/eval harnesses for prompts, tools, and agents; track drift and hallucination rates.
- Implement guardrails and adversarial/red‑team tests for safety, privacy, bias, and toxicity.
- Non‑functional testing: baseline and monitor performance, reliability, and resilience for key user journeys.
- CI/CD ownership: integrate tests into pipelines, gate releases, reduce flakiness, and surface quality KPIs (e.g., failure rate, MTTR, coverage).
- Test data & environments: manage fixtures, synthetic data, and stable seeds for deterministic runs.
- Quality operations: triage defects, drive root‑cause analysis, and keep crisp, proactive communication with PM, Design, and Engineering.
- Documentation: maintain clear test plans, playbooks, and release criteria.
- Nice to have:
- Experience with agent frameworks, tool calling, retrieval/RAG, or tracing.
- Familiarity with evaluation/observability tools for LLM apps (e.g., prompt testing frameworks, tracing dashboards).
- Performance testing (e.g., k6/JMeter) and basic security/safety testing practices.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
automationUI testingE2E testingAPI testingRESTGraphQLdata testingnon-functional testingperformance testingCI/CD
Soft skills
communicationroot-cause analysisproactive communicationtriage defects