Twilio

Site Reliability Engineer

Twilio

full-time

Posted on:

Location: California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $152,500 - $224,200 per year

Job Level

SeniorLead

Tech Stack

AirflowApacheAWSCloudDistributed SystemsEC2GoGrafanaJavaKafkaKubernetesPrometheusPythonTerraform

About the role

  • Design, build, and maintain infrastructure and scalable frameworks to support data ingestion, processing, and analysis.
  • Collaborate with stakeholders, analysts, and product teams to understand business requirements and translate them into technical solutions.
  • Architect and implement data streaming solutions using modern data technologies such as Kafka, AWS MSK, Terraform, Hive, Hudi, Presto, Airflow, and cloud-based services like AWS EKS, Lakeformation, Glue and Athena.
  • Design and implement frameworks and solutions for performance, reliability, and cost-efficiency.
  • Ensure data quality, integrity, and security throughout the data lifecycle.
  • Stay current with emerging technologies and best practices in big data technologies.
  • Mentor early in career engineers and contribute to a culture of continuous learning and improvement.

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with a focus on infrastructure or backend systems.
  • Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability.
  • Hands-on experience with Kubernetes (preferably EKS), including deploying and managing stateful services and operators in Kubernetes environments.
  • Deep understanding of AWS cloud services, particularly those relevant to data infrastructure (e.g., EC2, EBS, S3, IAM, MSK, CloudWatch, VPC, ALB/NLB).
  • Proficiency in infrastructure-as-code tools, such as Terraform or CloudFormation, for managing and automating infrastructure.
  • Expertise in observability tools (e.g., Prometheus, Grafana, OpenTelemetry, Datadog) to monitor distributed systems and set up alerting for reliability and latency.
  • Proficient in at least one programming language (e.g., Go, Python, Java, or similar) for building automation, tooling, and contributing to platform services.
  • Experience designing and implementing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
  • Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
  • Proven track record of driving reliability improvements in high-scale, data-intensive systems and collaborating with platform and data engineering teams.
  • Excellent problem-solving and analytical skills.
  • Strong verbal & written communication skills, with the ability to work effectively in a cross-functional team environment.
CACI International Inc

Senior Infrastructure – DevOps Engineer

CACI International Inc
Seniorfull-time$99k–$207k / yearVirginia · 🇺🇸 United States
Posted: 55 minutes agoSource: caci.wd1.myworkdayjobs.com
AnsibleChefCyber SecurityKubernetesNode.jsPuppetTypeScript
Circle

Senior Site Reliability Engineer

Circle
Seniorfull-time$130k–$140k / yearCalifornia · 🇺🇸 United States
Posted: 2 hours agoSource: boards.greenhouse.io
AWSKubernetesMySQLPostgresRedis
General Dynamics Information Technology

Azure DevOps Engineer – FedRAMP Healthcare Modernization

General Dynamics Information Technology
Mid · Seniorfull-time$73k–$95k / year🇺🇸 United States
Posted: 3 hours agoSource: gdit.wd5.myworkdayjobs.com
AzureCyber SecurityPythonTerraform
Coates Group

Senior DevOps Engineer

Coates Group
Seniorfull-time$125k–$140k / yearIllinois · 🇺🇸 United States
Posted: 9 hours agoSource: jobs.lever.co
AWSCloudDockerIoTLinuxMicroservicesPython
Eduphoria! Inc.

AWS DevOps Engineer

Eduphoria! Inc.
Mid · Seniorfull-time$110k–$125k / yearFlorida, Illinois, Kansas, Maryland, North Carolina, Ohio, Tennessee, Texas, Virginia · 🇺🇸 United States
Posted: 23 hours agoSource: eduphoria.applytojob.com
AWSAzureCloudEC2LinuxMySQL.NETSQLTerraform