Twilio

Software Engineer

Twilio

full-time

Posted on:

Origin:  • 🇮🇪 Ireland

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDistributed SystemsGoGrafanaJavaKafkaKubernetesPrometheusPython

About the role

  • Lead the end-to-end architecture and delivery of key observability platform components, focusing on reliability, scalability, and usability.
  • Drive consistency and quality across all observability signals—logs, metrics, traces, and continuous profiling—building intuitive workflows for engineers.
  • Serve as a technical advisor and mentor across the platform org, guiding design decisions and aligning cross-team efforts with long-term architectural goals.
  • Go deep in one or more problem areas (e.g., high-cardinality telemetry, distributed tracing correlation, compute cost insights), while ensuring platform horizontal scalability.
  • Collaborate with product teams, SREs, and developer experience groups to understand telemetry needs and integrate observability into core engineering workflows.
  • Design and build developer-friendly tooling and APIs to support incident response, performance analysis, and platform debugging at scale.
  • Leverage and optionally contribute to open-source standards like OpenTelemetry to ensure interoperability and extensibility.
  • Champion a pragmatic approach to observability—balancing performance, cost, and user value across diverse engineering teams.

Requirements

  • Proven expertise in building and scaling observability systems (e.g., logging platforms, metrics pipelines, tracing infrastructure, or profiling tools).
  • Lead technical execution for major components of Twilio’s observability overhaul, including shift to centralized S3-based data lakes, OpenTelemetry instrumentation, and ClickHouse-backed query engines.
  • Deep proficiency in at least one modern programming language (e.g., Go, Python, Java).
  • Familiarity with high-cardinality data challenges and telemetry correlation techniques.
  • Experience designing high-scale telemetry systems (e.g., Prometheus, ClickHouse, OpenTelemetry, Kafka, or equivalent).
  • Solid understanding of distributed systems and the challenges of observability in complex, microservice-based environments.
  • Experience with AWS, Kubernetes, and infrastructure-as-code tools.
  • Provide architectural guidance and thought leadership across teams, helping to establish clear telemetry standards, efficient usage patterns, and scalable platform abstractions.
  • Ability to make forward-looking technical decisions and lead others through ambiguity.