Ensure the reliability, scalability, performance and security of the systems powering millions of users.
Partner closely with development teams as a hybrid role combining software engineering and systems operations to work on applications including brokerage, crypto and money.
Work on Service Level Agreements (SLAs) and Service Level Objectives (SLOs), incident metrics (MTTD and MTTR), Production Readiness Review (PRR), monitoring, canary, and shift left on testing including pre-production, integration and load testing.
Design, build, and maintain large-scale systems that power Robinhood’s platform, infrastructure, and core services.
Write and review high-quality code, create capacity and scaling plans, and debug complex, real-time issues in mission-critical systems used by millions of customers.
Lead by example, mentoring teammates, promoting best practices, and fostering a culture focused on operational excellence and system resilience.
Take ownership of system reliability by participating in on-call rotations, proactively addressing potential issues, and driving long-term improvements to reduce downtime.
Collaborate with industry-leading engineers to develop scalable tools and infrastructure that meet Robinhood’s growing demands.
Build the roadmap, centralized tooling, and ensure proper focus for the team as a founding engineer on a newly formed Reliability team.
Drive innovation by optimizing infrastructure for reliability and cost-efficiency.
Requirements
8+ years experience in designing, building, and maintaining large-scale, distributed systems
Proficiency in programming languages such as Python/Go/C++
Expertise in operating systems (Linux/Unix), networking, and troubleshooting sophisticated production issues in high-availability environments.
A track record of mentoring team members, fostering collaboration, and contributing to a culture of continuous improvement.
Built and owned the pre-production and staging environments for internal software engineers. (bonus)
Experience running on Elastic Kubernetes Service (EKS) on AWS or another cloud provider (bonus)
Experience working with Observability systems with a goal of reducing incident metrics such as Mean-Time-To-Detect (MTTD) and Mean-Time-To-Resolve (MTTR) (bonus)
Experience working with large Infrastructure components such as compute, storage networking and/ or developer infrastructure (bonus)
Benefits
Challenging, high-impact work to grow your career
Performance driven compensation with multipliers for outsized impact, bonus programs, equity ownership, and 401(k) matching
In addition to the base pay range listed below, this role is also eligible for bonus opportunities + equity + benefits
Best in class benefits to fuel your work, including 100% paid health insurance for employees with 90% coverage for dependents
Lifestyle wallet - a highly flexible benefits spending account for wellness, learning, and more
Employer-paid life & disability insurance, fertility benefits, and mental health benefits
Time off to recharge including company holidays, paid time off, sick time, parental leave, and more!
Exceptional office experience with catered meals, events, and comfortable workspaces.
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.