
Distributed Systems Engineer – Data Platform, Analytics and Alerts
Cloudflare
full-time
Posted on:
Location Type: Hybrid
Location: Atlanta • Colorado, District of Columbia, Texas, Washington • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
Distributed SystemsDockerGoGrafanaGraphQLKafkaKubernetesLinuxPrometheusSaltStackSQLTerraform
About the role
- Develop and enhance our customer-facing APIs focusing on performance, reliability, and an intuitive user experience.
- Design, build, and maintain our near real-time alerting platform, from data processing and anomaly detection to reliable notification delivery.
- Optimise the performance of complex analytical queries that power our APIs and dashboards, working closely with the database platform team.
- Create intuitive and powerful tools that allow customers to explore their data and configure meaningful alerts based on logs and metrics.
- Scale our API and alerting infrastructure to support a growing number of internal and external use cases.
- Collaborate with front-end engineers and product managers to define API contracts and deliver a seamless data experience for our users.
- Ensure the operational health of our APIs and alerting systems by developing comprehensive monitoring, and participating in an on-call rotation (with the flexibility to be on-call outside of standard working hours as needed).
Requirements
- 3+ years of experience working in software development covering distributed systems and scalable APIs.
- Strong programming skills (Go is preferable), with a deep understanding of software development best practices for building performant, customer-facing services.
- Hands-on experience with modern observability stacks, including Prometheus, Grafana, and a strong understanding of handling high-cardinality metrics at scale.
- Strong knowledge of SQL, including extensive experience with complex query optimisation.
- A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
- Strong analytical and problem-solving skills, with a willingness to debug, troubleshoot, and learn about complex problems at high scale.
- Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
- Experience developing and scaling APIs, particularly GraphQL, is a strong plus.
- Experience with data streaming technologies (e.g., Kafka, Flink) for real-time processing is a plus.
- Experience with Infrastructure as Code tools like SALT or Terraform is a plus.
- Experience with Linux container technologies, such as Docker and Kubernetes, is a plus.
Benefits
- Competitive salary
- Flexible working hours
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
GoSQLGraphQLdata processinganomaly detectioncomplex query optimisationalgorithmsdata structuresdistributed systemsconcurrency
Soft skills
analytical skillsproblem-solving skillscollaborationcommunication