FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal AI Engineer
SalesforceAI Platform Engineer building next generation of ML/AI platform that powers autonomous AI agents. Collaborating in platform infrastructure and agent systems engineering for enterprise scale.
Posted 5/29/2026full-timeSan Francisco • California, Illinois, New York, Washington • 🇺🇸 United StatesLead💰 $172,500 - $313,700 per yearWebsite
Tech Stack
Tools & technologiesAWSDockerGrafanaKubernetesOpen SourcePythonTerraform
About the role
Key responsibilities & impact- Design and build agent harness infrastructure: the scaffolding that wraps LLM calls, manages tool use, handles retries, enforces policy, and feeds results back into iterative improvement loops.
- Implement agentic loop patterns with multi-turn reasoning, tool orchestration, memory management, and structured output handling as reusable platform primitives
- Build the agent flywheel: automated pipelines that collect agent traces, surface regressions, route failures to evaluation, and close the loop from production signal back to prompt/model improvement
- Own the end-to-end lifecycle from agent experiment to production deployment, including versioning, rollout controls, and rollback mechanisms
- Build sandboxed execution environments for agent tools with isolating code execution, API calls, and file system access so agents can act without unconstrained blast radius
- Design tiered autonomy models: define which actions agents can take automatically, which require human approval, and which are off-limits and enforced at the infrastructure layer
- Implement replay and dry-run capabilities so new agent versions can be tested against real traces before going live
- Implement evaluation frameworks for agent behavior using a combination of vendor, open source or in house built tools — covering task success, tool selection accuracy, trajectory evaluation, hallucination rates, latency, and cost
- Build and maintain eval datasets, golden trace libraries, and regression test suites that run automatically on every agent code change
- Instrument agent traces end-to-end: LLM calls, tool invocations, intermediate reasoning, final outputs — surfaced in Grafana or equivalent observability tooling
- Define and track agent quality metrics over time; own the signal that tells the team whether agents are getting better or worse
- Drive continuous quality, latency, and cost improvements across deployed agents by closing the loop between production traces, evaluations, and agent design. Optimization may be done through a variety of techniques e.g. prompt tuning, tool calling optimizations, context engineering, right-sizing model selection per task and explore distillation or fine-tuning (SFT, DPO, RLHF) on curated trace data to name a few
- Validate every optimization through A/B tests, shadow deployments, and replay against golden traces, with the eval suite gating rollout so wins are real and regressions are caught before they reach users
- Build and optimize CI/CD pipelines (GitHub Actions, ArgoCD) that cover not just code deployment but agent evaluation gates — no agent ships without passing its eval suite
- Automate Docker and package builds, security scanning, and agent integration tests as first-class pipeline steps
- Design self-healing CI patterns where agent-based automation can diagnose and fix common pipeline failures
- Build internal tools and developer self-service interfaces that let ML engineers and data scientists iterate on agents without platform team involvement
- Maintain a comprehensive view of how all platform components -> infrastructure, agent harnesses, evaluation pipelines, observability — work together
- Create architecture diagrams and drive long-term platform vision; own the "how does this scale to 10x" conversation
- Establish alerting (Grafana, PagerDuty) for both traditional platform health and agent-specific signals (error rates, tool call failures, eval score drift)
- Ensure all agent infrastructure adheres to security best practices: sandboxed execution, auditable traces, access controls on every tool
- Participate in security reviews; own compliance for agent workloads
Requirements
What you’ll need- 9+ years as a Platform Engineer, ML Infrastructure Engineer, or Software Engineer
- Demonstrated experience building agent harness infrastructure using agentic loops, tool orchestration, structured output handling, multi-turn conversation management
- Hands-on experience with agent evaluation frameworks like Braintrust, LangSmith, or equivalent, including building eval datasets, running automated regression suites, and tracking quality metrics over time
- Strong understanding of sandboxing and safe agent execution like isolation patterns, tiered autonomy, blast radius controls
- Experience with context Engineering as it relates to Agent orchestration.
- Strong Python engineering skills for building scalable tools, automation, and platform components
- Deep expertise in AWS
- Extensive experience with CI/CD tooling, especially GitHub Actions and ArgoCD
- Proficiency in infrastructure-as-code (Terraform)
- Experience with containerization (Docker) and orchestration (Kubernetes)
- Experience with AgentOps concepts and production Multi Agent systems
- Strong problem-solving skills and ability to manage multiple priorities across a complex platform
Benefits
Comp & perks- time off programs
- medical
- dental
- vision
- mental health support
- paid parental leave
- life and disability insurance
- 401(k)
- employee stock purchasing program
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonAWSCI/CDGitHub ActionsArgoCDTerraformDockerKubernetesagent evaluation frameworkscontext engineering
Soft Skills
problem-solvingmulti-priority management