Apollo GraphQL

Staff Software Engineer, AI Runtime

Apollo GraphQL

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $182,750 - $232,000 per year

Job Level

Lead

Tech Stack

ApolloCloudDistributed Systems

About the role

  • Help power the future of agentic AI workflows by taking MCP Server to an enterprise-grade service.
  • Architect MCP Gateway—a new layer that will route requests across tools, enforce policies, and provide the runtime foundation for scalable multi-agent systems.
  • Tackle challenges in scalability, performance, and developer experience to ensure our platform feels seamless and enterprise-ready.
  • The Graph DX AI Runtime Team builds MCP Server and Gateway—the backbone of agent-to-tool communication and the routing layer that keeps everything flowing.

Requirements

  • Expertise in agent-to-tool orchestration, routing, and coordination in scalable, fault-tolerant systems.
  • Strong background in distributed systems, server architecture, and high-performance backend development.
  • Proven experience with protocol design, message routing, and server-side orchestration frameworks.
  • Experience building and maintaining robust runtime infrastructure that supports AI-driven workflows and enables reliable agent-to-tool interactions.
  • Proven experience with protocol design, message routing, and building server-side frameworks that enable scalable, reliable multi-tool agent workflows.
  • Hands-on experience with observability, monitoring, and debugging frameworks for complex systems.
  • Passion for clean, maintainable code, high system reliability, and scalable architecture.
  • Experience in strategic system design, making architectural trade-offs, and planning for long-term scalability and maintainability.
  • Strong technical leadership and mentorship, including guiding junior engineers and driving engineering best practices across teams.
  • Ability to influence cross-team architecture decisions and align engineering efforts with product and business objectives.
  • Production ownership experience: leading incident response, debugging, and performance optimization in high-impact backend systems.
  • Bonus Points: Exposure to AI/ML-enabled developer tooling or autonomous system orchestration.
  • Familiarity with cloud-native architectures, containerization, or orchestration frameworks.
  • Experience with performance optimization and cost-efficient scaling of high-throughput distributed systems.