Tech Stack
CloudDistributed SystemsDockerGoGrafanaKubernetesPrometheusPythonRayRedisTypeScript
About the role
- Build and manage directed workflows (DAGs, state machines, LangGraph flows)
- Define how data and context move between AI models, APIs, and humans in the loop
- Design coordination strategies for multiple AI agents with specialized roles
- Implement arbitration logic to merge outputs, resolve conflicts, and dynamically route tasks
- Connect AI systems with vector databases, APIs, cloud platforms, and external data sources
- Handle orchestration across distributed environments (Kubernetes, serverless)
- Implement retries, fallbacks, and guardrails to keep workflows stable
- Ensure systems degrade gracefully when AI outputs are uncertain or incorrect
- Tune orchestration for cost, latency, and accuracy
- Build observability dashboards, logging, and metrics to measure workflow success
Requirements
- Proficiency in Python, TypeScript, or Go
- Experience with LangGraph, LangChain, or Ray
- Strong knowledge of Docker, Kubernetes, CI/CD pipelines, and observability tools (Prometheus, Grafana)
- Familiarity with LLMs, RAG systems, embeddings, and multi-agent patterns
- Experience with vector databases (FAISS, Pinecone, Weaviate) and caching systems (Redis, Memcache)
- Can come into the office three days onsite in Palo Alto, CA (hybrid requirement)
- Will you require any visa sponsorship in the future? If yes, please provide details. (application question)