Red Hat

Software Engineer – AI Testing and Workflow Validation / Prompt Engineer – Feature Development

Red Hat

full-time

Posted on:

Origin:  • 🇺🇸 United States • Massachusetts, North Carolina

Visit company website
AI Apply
Manual Apply

Salary

💰 $189,600 - $312,730 per year

Job Level

Mid-LevelSenior

Tech Stack

CloudKubernetesLinuxOpen SourcePython

About the role

  • Design, iterate, and optimize prompts for AI-assisted code generation and workflow execution.
  • Collaborate with product owners, developers, and QA to translate feature requirements into effective prompting strategies and evaluation benchmarks.
  • Validate AI-generated outputs through structured test cases, regression testing, and performance evaluation.
  • Build or adapt internal tooling to streamline prompt testing, versioning, and deployment.
  • Capture insights from prompt performance, including failure modes, and iterate toward more consistent, high-quality results.
  • Document prompt patterns, metrics, and best practices to support knowledge sharing across the team.
  • Work closely with engineering, QA, and product teams to ensure AI-driven features are accurate, reliable, and aligned with user and technical requirements.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience.
  • Hands-on experience working with large language models (e.g., GPT, LLaMA, Claude) for code generation or workflow automation.
  • Strong coding skills in Python or a similar language, with the ability to integrate AI outputs into backend or frontend systems.
  • Familiarity with testing methodologies and frameworks, including unit, integration, and regression testing.
  • Excellent analytical, problem-solving, and communication skills.
  • (Nice-to-Have) Prior experience with AI-assisted development tools or IDE integrations.
  • (Nice-to-Have) Knowledge of prompt-engineering best practices, including bias mitigation and output evaluation metrics.
  • (Nice-to-Have) Familiarity with model performance metrics such as functional correctness, ROUGE, or BLEU.