Tech Stack
CloudGoNode.jsPythonTerraform
About the role
- Own infrastructure and tooling behind Datadog’s public API surface spanning more than 7,000 endpoints
- Lead technical direction across multiple teams spanning API routing, rate limiting, frameworks, reliability tooling, and infrastructure automation
- Design and evolve edge-layer systems that handle traffic before it reaches backend services, including routing, rate limiting, audit logging, WAF, and other protective layers
- Contribute to internal frameworks in Go and Python used by thousands of engineers
- Build automation tools and lead efforts to provision services in new data centers across opinionated teams
- Support reliability for over 7,000 public APIs, including ownership of load testing infrastructure and a Terraform provider
- Define and implement best practices for testing, observability, and resilience across the API stack
- Mentor engineers and shape the long-term architecture and direction of the API Platform group
- Collaborate on cross-company initiatives such as MCP servers, A2A protocol, and long-term reliability improvements
Requirements
- You have 10 or more years of engineering experience with a strong background in backend systems and platform infrastructure
- You bring expertise in site reliability engineering, backend architecture, or frameworks development
- You have strong software development experience and are comfortable working in a Go-heavy codebase
- You are comfortable navigating ambiguity and driving work across multiple teams with differing priorities
- You enjoy mentoring, guiding teams through complex infrastructure problems, and leading high-impact projects end to end