Staff Database Reliability Engineer

Scribe

Staff Database Reliability Engineer managing data infrastructure and leading database initiatives at Scribe. Ensuring operational excellence and driving observability across database systems.

Posted 5/7/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $225,000 - $250,000 per yearWebsite

Tech Stack

Tools & technologies

Amazon RedshiftAWSBigQueryDjangoGoKafkaPostgresPythonRabbitMQRedisSQLTerraform

About the role

Key responsibilities & impact

Own the data tier end-to-end
Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases
Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints
Catch N+1 patterns and missing select_related/prefetch_related in review
Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning)
Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge
Capacity planning as traffic and engineering throughput grow
Zero-downtime schema migrations and cutovers
Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs
Backups, PITR, failover testing, retention
Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake)
DMS task design and tuning, replication slot hygiene on the Postgres side
Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM
Parquet layout and partitioning, reliability of the Snowflake handoff
Automated checks that flag migrations likely to break downstream consumers
Drive observability across three complementary tools: pganalyze, CloudWatch, Honeycomb

Requirements

What you’ll need

Deep PostgreSQL - EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling)
Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) - predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries
Single-region multi-AZ design - practical understanding of what it does and doesn't protect against
Production CDC experience, ideally AWS DMS - comfortable with logical replication, slot hygiene, schema evolution, and Parquet-based data lakes feeding Snowflake (or BigQuery/Redshift)
Hands-on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high-cardinality tracing tool) - comfortable with OpenTelemetry and opinionated about what makes a trace useful
Real experience making AI coding and review tools useful for a team - writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs
OpenSearch at scale - sizing, sharding, JVM tuning, rolling upgrades, snapshots
Production Redis - persistence tradeoffs, cluster mode, hot keys, thundering herds
At least one production message broker (SQS, RabbitMQ, Kafka) - delivery semantics, idempotency, failure modes
Strong automation and IaC background - real code (Python, Go, or similar) and Terraform
Track record leading cross-team initiatives, writing design docs that hold up, influencing without authority
Comfortable in a high-growth environment where the right answer for 50 engineers isn't the right answer for 100
Pragmatic outlook during incidents - focused on preventing the next one

Benefits

Comp & perks

Some of the nicest and smartest teammates you’ll ever work with
Competitive salaries
Comprehensive healthcare benefits
Exciting and motivating equity
Flexible PTO
401k
Parental Leave
Commuter Benefits (SF office employees)
WFH Stipend

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PostgreSQLAurora Serverless V2DjangoSQLAlchemyActiveRecordAWS DMSOpenSearchRedisTerraformPython

Soft Skills

leadershipinfluencing without authorityautomationpragmatic outlookcross-team initiativesdesign documentationhigh-growth adaptabilityincident prevention