
Distributed Systems Engineer – Data Platform, Logs and Audit Logs
Cloudflare
full-time
Posted on:
Location Type: Hybrid
Location: Atlanta • Colorado, Texas, Virginia • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
Distributed SystemsDockerGoGrafanaKafkaKubernetesLinuxPrometheusSaltStackSplunkSQLTerraform
About the role
- Design, build, and operate a robust logging platform, ensuring reliable logging, and secure data transfer to a wide array of customer destinations and third-party integrations.
- Develop and maintain high-performance data connectors and integrations for our log-shipping products, focusing on usability, scalability and data integrity.
- Create and manage systems for handling comprehensive audit logs, ensuring they are delivered securely and adhere to strict compliance and performance standards.
- Scale and optimise the data delivery pipeline to handle massive data volumes with low latency, identifying and removing bottlenecks in data processing and routing.
- Work closely with Product and other engineering teams to define requirements for a new logging platform and integrations.
- Maintain the operational health of our log delivery platform through comprehensive monitoring and participation in an on-call rotation (with flexibility for out-of-hours technical issue resolution).
- Collaborate on the architectural evolution of our data egress platform, researching and implementing new technologies to improve efficiency and reliability.
Requirements
- 3+ years of experience working in software development covering distributed systems and data pipelines.
- Strong programming skills (Go is preferable), with a deep understanding of software development best practices for building resilient, high-throughput systems.
- Hands-on experience with modern observability stacks, including Prometheus, Grafana, and a strong understanding of handling high-cardinality metrics at scale.
- Strong knowledge of SQL, including experience with query optimisation.
- A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
- Strong analytical and problem-solving skills, with a willingness to debug, troubleshoot, and learn about complex problems at high scale.
- Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
- Experience with data streaming technologies (e.g., Kafka, Flink) is a strong plus.
- Experience with various logging platforms or SIEMs (e.g., Splunk, Datadog, Sumo Logic) and storage destinations (e.g., S3, R2, GCS) is a plus.
- Experience with Infrastructure as Code tools like SALT or Terraform is a plus.
- Experience with Linux container technologies, such as Docker and Kubernetes, is a plus.
Benefits
- Competitive salary
- Flexible work arrangements
- Professional development opportunities
- Paid time off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
GoSQLdata pipelinesdistributed systemsobservability stackshigh-cardinality metricsdata streaming technologiesInfrastructure as CodeLinux container technologiesalgorithms
Soft skills
analytical skillsproblem-solving skillscollaborationcommunication