PwC

LLM Observability Engineering Manager

PwC

full-time

Posted on:

Location Type: Office

Location: San Francisco • California, Illinois, Minnesota, New York • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $99,000 - $232,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AWSAzureCloudGoGoogle Cloud PlatformJavaJavaScriptNode.jsPython

About the role

  • Architect and implement observability solutions for Large Language Models offering real-time insights into critical workflows
  • Enhance DataDog integrations for improved monitoring capabilities
  • Lead teams in developing quality deliverables while cultivating meaningful client relationships
  • Work with cross-functional teams to achieve project goals and automation
  • Mentor team members to develop their technical skills and use reviews to deepen expertise
  • Take ownership of projects including planning, budgeting, execution, and completion
  • Partner with team leadership to ensure collective ownership of quality, timelines, and deliverables
  • Lead root-cause investigations for LLM incidents and manage security monitoring and compliance reporting
  • Contribute to open-source integrations and engage with DataDog and MLOps communities

Requirements

  • Bachelor's Degree in Mathematics, Engineering, Computer Science
  • At least 5 years of hands-on experience with DataDog
  • Master's Degree preferred
  • Experience in architecting DataDog integrations
  • Developing and maintaining observability for LLM platforms
  • Working with cross-functional teams for automation
  • Leading root-cause investigations for LLM incidents
  • Contributing to open-source DataDog integrations
  • Publishing or presenting in LLM observability field
  • Engaging with DataDog and MLOps communities
  • Experience with additional observability platforms
  • Specialization in architecting observability across large-scale LLM or GenAI systems
  • Proficient in instrumenting Python, Node.js, Java, or Go applications
  • Demonstrated experience with cloud-native infrastructure (AWS, Azure, GCP)
  • Extensive understanding of LLM architectures, embeddings, and evaluation metrics
  • Ability to implement and manage DataDog security monitoring and compliance reporting
  • Willingness to travel up to 40%
Benefits
  • medical
  • dental
  • vision
  • 401k
  • holiday pay
  • vacation
  • personal and family sick leave
  • annual discretionary bonus
  • professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
DataDogPythonNode.jsJavaGocloud-native infrastructureobservabilityLLM architecturessecurity monitoringcompliance reporting
Soft skills
leadershipmentoringclient relationship managementcross-functional collaborationproject ownershipcommunicationteam developmentproblem-solvingbudgetingplanning
Certifications
Bachelor's Degree in MathematicsBachelor's Degree in EngineeringBachelor's Degree in Computer ScienceMaster's Degree (preferred)