
Senior Site Reliability Engineer, Database Excellence
GitLab
full-time
Posted on:
Location Type: Remote
Location: California • New York • United States
Visit company websiteExplore more
Salary
💰 $124,300 - $266,400 per year
Job Level
About the role
- Automate operational tasks across all environments, from package updates and configuration changes to provisioning of user-facing services, so manual effort becomes the exception, not the rule.
- Design and maintain PostgreSQL database infrastructure components that allow GitLab.com to scale reliably while supporting hundreds of thousands of concurrent users.
- Respond to production incidents and platform emergencies, working with peer SREs to diagnose and resolve database-related issues quickly and thoroughly.
- Build observability systems that monitor database health, predict capacity needs based on usage patterns, and alert on symptoms rather than outages.
- Develop and ship database performance solutions in collaboration with product and engineering teams, including query optimization, migration reviews, and infrastructure recommendations.
- Create self-service tools and automation, using Terraform, Ansible, Chef, and GitLab ChatOps, that empower engineering teams to manage their own database interactions safely.
- Document decisions, learnings, and operational procedures so that knowledge becomes repeatable actions and eventually becomes automation.
- Participate in regularly scheduled on-call rotations to ensure GitLab.com remains operational during off-hours and weekends when necessary.
Requirements
- Hands-on experience running PostgreSQL in high-growth, large production environments, including both self-managed infrastructure and database-as-a-service platforms.
- Expertise with infrastructure automation and configuration management tools such as Ansible, Terraform, Chef, or Puppet to automate operational tasks and drive system reliability.
- Solid understanding of SQL, PL/pgSQL, data modeling, and data structure design; ability to analyze PostgreSQL internals to troubleshoot and optimize systems.
- Experience working in large-scale, distributed SaaS production environments where you've managed reliability, performance, and scalability challenges at significant scale.
- Strong written communication skills and commitment to documentation; you thrive in remote, asynchronous environments and share knowledge effectively across your team.
- Proactive, hands-on approach where you identify issues, take ownership of solutions, and contribute improvements to infrastructure and code.
- Capability to mentor junior team members and develop deep expertise in your domain areas, then share that knowledge to help others grow.
- Backend engineering experience with languages such as Ruby or Go, and/or familiarity with OLAP databases like Clickhouse.
- Familiarity with Kubernetes and operators for managing database infrastructure and stateful services in containerized environments.
Benefits
- Benefits to support your health, finances, and well-being
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and Development Fund
- Parental leave
- Home office support
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PostgreSQLSQLPL/pgSQLdata modelingdata structure designquery optimizationinfrastructure automationconfiguration managementbackend engineeringOLAP databases
Soft Skills
strong written communicationcommitment to documentationproactive approachownership of solutionsmentoringknowledge sharingcollaborationproblem-solvingadaptabilityteamwork