Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Synchrony

VP, AI Reliability & Performance Architect

Synchrony

VP, AI Reliability & Performance Architect responsible for reliability and performance in AWS-based AI ecosystem. Leading investigations and improvements in agent workflows and system reliability.

Posted 5/1/2026full-timeHyderabad • 🇮🇳 IndiaLeadWebsite

Tech Stack

Tools & technologies
AWSCloudPythonRaySplunkTerraform

About the role

Key responsibilities & impact
  • Ensure the production-grade reliability, accuracy, and performance of our AWS-based agentic AI ecosystem
  • Lead investigations of complex agent/AI workflow failures using logs, metrics, and traces
  • Improve the quality and performance of Retrieval-Augmented Generation (RAG) and agent workflows
  • Establish and oversee evaluation approaches for models, RAG, and agents
  • Partner with InfoSec/AppSec to review architectures and ensure designs follow enterprise security patterns
  • Work with Governance teams to implement and monitor guardrails and controls across the AI platform
  • Drive 'Design for Reliability' patterns across both Platform and Agent Building teams
  • Translate reliability risks, performance trends, and operational metrics into clear business language for senior leaders, risk, and product owners
  • Coach DevLeads and architects on debugging agent behaviors, strengthening observability pipelines, improving orchestration, and hardening production deployments

Requirements

What you’ll need
  • Bachelor's degree in Computer Science, Engineering, Information Systems, or related field (or equivalent experience)
  • 10–14 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture or in lieu of a degree 12–16 years of IT experience including meaningful roles in application development, platform engineering, SRE/operations, and/or architecture
  • Strong experience operating and improving reliability of cloud-native systems (AWS preferred; comparable cloud experience acceptable)
  • Experience supporting AI/ML systems is beneficial, but not mandatory if you demonstrate strong troubleshooting ability
  • Strong ability to script/build tooling in Python (or similar language) for reliability automation, analysis, testing, and operational workflows
  • Hands-on experience with observability practices and tools (CloudWatch/X-Ray/Splunk/New Relic or similar)
  • Experience with Infrastructure-as-Code (Terraform preferred; similar tools acceptable)
  • Working knowledge of identity and security patterns (OAuth2, SSO/federation, IAM roles/policies/SCP concepts)
  • Proven ability to lead through influence, drive standards/guardrails, and align multiple agile teams in a matrixed environment

Benefits

Comp & perks
  • best-in-class employee benefits and programs that cater to work-life integration and overall well-being
  • career advancement and upskilling opportunities, focusing on Advancing Diverse Talent to take up leadership roles

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSPythonInfrastructure-as-CodeTerraformAI/ML systemsReliability automationObservability practicesCloudWatchX-RaySplunk
Soft Skills
leadershipinfluencecommunicationcoachingcollaborationtroubleshootingalignmentorganizational skillsproblem-solvingagile methodologies