Salary
💰 $184,500 - $271,300 per year
Tech Stack
AWSCloudDistributed SystemsGoJavaKafkaKubernetesNoSQLPythonScalaSparkSQLTerraform
About the role
- Twilio is remote-first and offers remote work; the role is remote (not eligible in CA, CT, NJ, NY, PA, WA).\n
- This role is remote and part of Engineering; L5 Machine Learning & Data Engineer to lead the design, build, and operation of the internal ML-and-data platform that powers every customer interaction. You will architect cloud-native pipelines, model-serving infrastructure, and developer tooling that allow Twilio’s product teams to iterate rapidly and safely at scale, advancing our mission to unlock the imagination of builders.\n
- Responsibilities: Architect and evolve Twilio’s end-to-end ML and real-time data platforms for reliability, security, and cost efficiency.\n
- Design scalable feature stores, streaming and batch pipelines, and low-latency model-serving layers on AWS.\n
- Implement MLOps best practices—automated testing, CI/CD, monitoring, and rollback—for hundreds of daily deployments.\n
- Own system design reviews, threat modeling, and performance tuning for high-volume communications workloads.\n
- Lead cross-functional engineering efforts, breaking down complex initiatives into executable roadmaps.\n
- Mentor staff and senior engineers, raising the technical bar through code reviews and pair programming.\n
- Partner with Product, Security, and Compliance to meet stringent privacy and governance requirements (HIPAA, SOC 2, GDPR).\n
- Champion a culture of experimentation, data-driven decision-making, and continuous improvement.
Requirements
- Bachelor’s or higher in Computer Science, Engineering, Mathematics, or equivalent practical experience.
- 7+ years building and operating production data or machine-learning systems at scale.
- Expert fluency in Python and one compiled language (Java, Scala, Go, or C++).
- Hands-on mastery of distributed data frameworks (Spark/Flink), SQL/NoSQL stores, and streaming platforms (Kafka/Kinesis).
- Demonstrated success designing cloud-native architectures on AWS, including Terraform-managed infrastructure.
- Deep knowledge of container orchestration (Kubernetes/EKS), service-mesh networking, and autoscaling strategies.
- Practical experience implementing MLOps tooling such as MLflow, Kubeflow, SageMaker, or Vertex AI.
- Strong grasp of model-lifecycle concerns—feature engineering, offline/online parity, A/B testing, drift detection, and retraining.
- Proven ability to lead technical projects end-to-end and influence without authority across multiple teams.
- Exceptional written and verbal communication skills, with a bias toward clarity and action.