Define the end-to-end architecture for AgentForce’s model serving, inference orchestration, and agentic reasoning loops.
Make high-stakes technical decisions regarding "build vs. buy," model sizing, context window management, and retrieval-augmented generation (RAG) strategies.
Architect scalable pipelines for continuous learning (RLHF/RLAIF) that integrate seamlessly with production traffic without compromising latency or stability.
Design systems for multi-turn agent state management, memory persistence, and tool invocation (function calling).
Own the end-to-end architectural design of AgentForce AI capabilities from product requirements through model design, system implementation, and production rollout.
Translate product use cases (e.g., agent experiences, workflows, UI features) into concrete system architectures, including APIs, service contracts, and model interaction patterns.
Define reference architectures for AI-powered applications (web, backend services, agent runtimes) that standardize how products integrate with AgentForce models.
Translate abstract research concepts into concrete engineering specifications.
Collaborate with scientists to optimize models for deployment (quantization, distillation, pruning) without sacrificing reasoning capabilities.
Mentor Principal Scientists and Staff Engineers on system design principles and architectural patterns.

Requirements

PhD or Master’s in Computer Science, AI, Machine Learning, or Distributed Systems
10+ years of technical experience, with a specific focus on deploying ML models at scale
Proven experience acting as an Architect or Principal-level technical lead for large-scale AI or data platforms
Experience designing and building production-grade AI-powered applications or platforms
Experience defining public/internal APIs, SDKs, and service interfaces for ML/AI capabilities consumed by product teams
Familiarity with frontend–backend–model interaction patterns for low-latency user-facing AI experiences
Profound understanding of Transformer architectures, attention mechanisms, and the math behind LLMs (not just API usage)
Experience with high-performance inference serving (e.g., vLLM, TensorRT-LLM, TGI, Triton) and optimization techniques (quantization, LoRA adapters, paged attention)
Strong background in designing distributed systems, microservices, and event-driven architectures (Kafka, gRPC, Kubernetes)
Advanced proficiency in Python and familiarity with C++ or CUDA is a strong plus.

Benefits

time off programs
medical
dental
vision
mental health support
paid parental leave
life and disability insurance
401(k)
employee stock purchasing program

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model servinginference orchestrationagentic reasoningcontinuous learningmulti-turn agent state managementmemory persistencefunction callingTransformer architectureshigh-performance inference servingdistributed systems

Soft Skills

technical decision makingmentoringcollaborationsystem design principlesarchitectural patterns

Certifications

PhD in Computer ScienceMaster’s in Computer Science