Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Salesforce

Principal AI Engineer

Salesforce

AI Platform Engineer building next generation of ML/AI platform that powers autonomous AI agents. Collaborating in platform infrastructure and agent systems engineering for enterprise scale.

Posted 5/29/2026full-timeSan Francisco • California, Illinois, New York, Washington • 🇺🇸 United StatesLead💰 $172,500 - $313,700 per yearWebsite

Tech Stack

Tools & technologies
AWSDockerGrafanaKubernetesOpen SourcePythonTerraform

About the role

Key responsibilities & impact
  • Design and build agent harness infrastructure: the scaffolding that wraps LLM calls, manages tool use, handles retries, enforces policy, and feeds results back into iterative improvement loops.
  • Implement agentic loop patterns with multi-turn reasoning, tool orchestration, memory management, and structured output handling as reusable platform primitives
  • Build the agent flywheel: automated pipelines that collect agent traces, surface regressions, route failures to evaluation, and close the loop from production signal back to prompt/model improvement
  • Own the end-to-end lifecycle from agent experiment to production deployment, including versioning, rollout controls, and rollback mechanisms
  • Build sandboxed execution environments for agent tools with isolating code execution, API calls, and file system access so agents can act without unconstrained blast radius
  • Design tiered autonomy models: define which actions agents can take automatically, which require human approval, and which are off-limits and enforced at the infrastructure layer
  • Implement replay and dry-run capabilities so new agent versions can be tested against real traces before going live
  • Implement evaluation frameworks for agent behavior using a combination of vendor, open source or in house built tools — covering task success, tool selection accuracy, trajectory evaluation, hallucination rates, latency, and cost
  • Build and maintain eval datasets, golden trace libraries, and regression test suites that run automatically on every agent code change
  • Instrument agent traces end-to-end: LLM calls, tool invocations, intermediate reasoning, final outputs — surfaced in Grafana or equivalent observability tooling
  • Define and track agent quality metrics over time; own the signal that tells the team whether agents are getting better or worse
  • Drive continuous quality, latency, and cost improvements across deployed agents by closing the loop between production traces, evaluations, and agent design. Optimization may be done through a variety of techniques e.g. prompt tuning, tool calling optimizations, context engineering, right-sizing model selection per task and explore distillation or fine-tuning (SFT, DPO, RLHF) on curated trace data to name a few
  • Validate every optimization through A/B tests, shadow deployments, and replay against golden traces, with the eval suite gating rollout so wins are real and regressions are caught before they reach users
  • Build and optimize CI/CD pipelines (GitHub Actions, ArgoCD) that cover not just code deployment but agent evaluation gates — no agent ships without passing its eval suite
  • Automate Docker and package builds, security scanning, and agent integration tests as first-class pipeline steps
  • Design self-healing CI patterns where agent-based automation can diagnose and fix common pipeline failures
  • Build internal tools and developer self-service interfaces that let ML engineers and data scientists iterate on agents without platform team involvement
  • Maintain a comprehensive view of how all platform components -> infrastructure, agent harnesses, evaluation pipelines, observability — work together
  • Create architecture diagrams and drive long-term platform vision; own the "how does this scale to 10x" conversation
  • Establish alerting (Grafana, PagerDuty) for both traditional platform health and agent-specific signals (error rates, tool call failures, eval score drift)
  • Ensure all agent infrastructure adheres to security best practices: sandboxed execution, auditable traces, access controls on every tool
  • Participate in security reviews; own compliance for agent workloads

Requirements

What you’ll need
  • 9+ years as a Platform Engineer, ML Infrastructure Engineer, or Software Engineer
  • Demonstrated experience building agent harness infrastructure using agentic loops, tool orchestration, structured output handling, multi-turn conversation management
  • Hands-on experience with agent evaluation frameworks like Braintrust, LangSmith, or equivalent, including building eval datasets, running automated regression suites, and tracking quality metrics over time
  • Strong understanding of sandboxing and safe agent execution like isolation patterns, tiered autonomy, blast radius controls
  • Experience with context Engineering as it relates to Agent orchestration.
  • Strong Python engineering skills for building scalable tools, automation, and platform components
  • Deep expertise in AWS
  • Extensive experience with CI/CD tooling, especially GitHub Actions and ArgoCD
  • Proficiency in infrastructure-as-code (Terraform)
  • Experience with containerization (Docker) and orchestration (Kubernetes)
  • Experience with AgentOps concepts and production Multi Agent systems
  • Strong problem-solving skills and ability to manage multiple priorities across a complex platform

Benefits

Comp & perks
  • time off programs
  • medical
  • dental
  • vision
  • mental health support
  • paid parental leave
  • life and disability insurance
  • 401(k)
  • employee stock purchasing program

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonAWSCI/CDGitHub ActionsArgoCDTerraformDockerKubernetesagent evaluation frameworkscontext engineering
Soft Skills
problem-solvingmulti-priority management