Tech Stack
AWSAzureCloudGoogle Cloud PlatformPython
About the role
- Validate AI-driven features and ensure model reliability across enterprise-grade applications.
- Collaborate with product, data science, and engineering teams to understand AI use cases and define test strategies.
- Design and execute test cases for LLM-based features such as summarization, classification, conversational flows, and content generation.
- Validate model outputs for factual accuracy, tone, relevance, and compliance with domain-specific standards.
- Develop automated test scripts and frameworks for prompt-response validation and regression testing.
- Evaluate ML models using metrics like precision, recall, F1-score, and domain-specific thresholds.
- Conduct integration testing across cloud platforms (e.g., AWS, Azure, GCP) and data environments (e.g., Snowflake, Databricks).
- Document test results, anomalies, and improvement suggestions in structured formats.
- Ensure ethical AI practices and compliance with data governance policies.
Requirements
- 5+ years in software testing, QA, or data validation roles.
- 2–3 years hands-on experience testing Generative AI models (e.g., GPT, LLaMA, Claude).
- Strong proficiency in Python and experience with ML testing tools or frameworks.
- Familiarity with cloud platforms (AWS, Azure, GCP).
- Experience with data engineering environments (Snowflake, Databricks).
- Understanding of ML workflows including time series, regression, classification, and neural networks.
- Experience evaluating models using metrics like precision, recall, F1-score, and domain-specific thresholds.
- Excellent communication skills and ability to drive cross-functional discussions with ownership.
- Preferred: exposure to domain-specific AI applications (pharma, healthcare, finance).
- Preferred: experience with prompt engineering and LLM evaluation frameworks.
- Preferred: knowledge of ethical AI principles and regulatory compliance (e.g., HIPAA, GDPR).
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
software testingdata validationGenerative AI testingPythonML testing toolsML workflowsmodel evaluation metricsautomated test scriptsregression testingprompt engineering
Soft skills
communication skillscross-functional collaborationownership