Robinhood

Staff Software Engineer, Reliability

Robinhood

full-time

Posted on:

Location Type: Hybrid

Location: Menlo Park • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $217,000 - $255,000 per year

Job Level

Lead

Tech Stack

AWSCloudDistributed SystemsGoKubernetesLinuxPythonUnix

About the role

  • Ensure the reliability, scalability, performance and security of the systems powering millions of users.
  • Partner closely with development teams as a hybrid role combining software engineering and systems operations to work on applications including brokerage, crypto and money.
  • Work on Service Level Agreements (SLAs) and Service Level Objectives (SLOs), incident metrics (MTTD and MTTR), Production Readiness Review (PRR), monitoring, canary, and shift left on testing including pre-production, integration and load testing.
  • Design, build, and maintain large-scale systems that power Robinhood’s platform, infrastructure, and core services.
  • Write and review high-quality code, create capacity and scaling plans, and debug complex, real-time issues in mission-critical systems used by millions of customers.
  • Lead by example, mentoring teammates, promoting best practices, and fostering a culture focused on operational excellence and system resilience.
  • Take ownership of system reliability by participating in on-call rotations, proactively addressing potential issues, and driving long-term improvements to reduce downtime.
  • Collaborate with industry-leading engineers to develop scalable tools and infrastructure that meet Robinhood’s growing demands.
  • Build the roadmap, centralized tooling, and ensure proper focus for the team as a founding engineer on a newly formed Reliability team.
  • Drive innovation by optimizing infrastructure for reliability and cost-efficiency.

Requirements

  • 8+ years experience in designing, building, and maintaining large-scale, distributed systems
  • Proficiency in programming languages such as Python/Go/C++
  • Expertise in operating systems (Linux/Unix), networking, and troubleshooting sophisticated production issues in high-availability environments.
  • A track record of mentoring team members, fostering collaboration, and contributing to a culture of continuous improvement.
  • Built and owned the pre-production and staging environments for internal software engineers. (bonus)
  • Experience running on Elastic Kubernetes Service (EKS) on AWS or another cloud provider (bonus)
  • Experience working with Observability systems with a goal of reducing incident metrics such as Mean-Time-To-Detect (MTTD) and Mean-Time-To-Resolve (MTTR) (bonus)
  • Experience working with large Infrastructure components such as compute, storage networking and/ or developer infrastructure (bonus)
Benefits
  • Challenging, high-impact work to grow your career
  • Performance driven compensation with multipliers for outsized impact, bonus programs, equity ownership, and 401(k) matching
  • In addition to the base pay range listed below, this role is also eligible for bonus opportunities + equity + benefits
  • Best in class benefits to fuel your work, including 100% paid health insurance for employees with 90% coverage for dependents
  • Lifestyle wallet - a highly flexible benefits spending account for wellness, learning, and more
  • Employer-paid life & disability insurance, fertility benefits, and mental health benefits
  • Time off to recharge including company holidays, paid time off, sick time, parental leave, and more!
  • Exceptional office experience with catered meals, events, and comfortable workspaces.

ATS Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonGoC++LinuxUnixnetworkingtroubleshootingElastic Kubernetes ServiceAWSobservability
Soft skills
mentoringcollaborationcontinuous improvementleadershipownershipoperational excellencesystem resilience
Kraken

Lead Site Reliability Engineer

Kraken
Seniorfull-time$170k–$200k / year🇺🇸 United States
Posted: 12 hours agoSource: jobs.lever.co
AWSDistributed SystemsDockerKubernetesLinuxPostgresPythonRabbitMQRDBMSTerraform
Truelogic Software

Staff DevOps Engineer, AWS – Health Care

Truelogic Software
Leadfull-timeCalifornia · 🇺🇸 United States
Posted: 8 days agoSource: jobs.ashbyhq.com
AWSCloudDistributed SystemsDockerEC2GoGrafanaJenkinsKubernetesLinuxPrometheusPython+1 more
Aerospike

Senior Support Engineer

Aerospike
Seniorfull-time$150k–$180k / year🇺🇸 United States
Posted: 31 days agoSource: boards.greenhouse.io
AWSCloudDistributed SystemsDockerKubernetesLinuxNoSQL
Splunk

Manager, SRE, FedRAMP

Splunk
Senior · Leadfull-time$140k–$192k / yearIllinois · 🇺🇸 United States
Posted: 34 days agoSource: jobs.jobvite.com
ApacheAWSCassandraCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaJenkinsKafkaKubernetesMicroservices+9 more
Twilio

Machine Learning & Data Engineer - P5

Twilio
Senior · Leadfull-time$185k–$271k / year🇺🇸 United States
Posted: 42 days agoSource: boards.greenhouse.io
AWSCloudDistributed SystemsGoJavaKafkaKubernetesNoSQLPythonScalaSparkSQL+1 more