Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
S

Staff Database Reliability Engineer

Scribe

Staff Database Reliability Engineer managing data infrastructure and leading database initiatives at Scribe. Ensuring operational excellence and driving observability across database systems.

Posted 5/7/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $225,000 - $250,000 per yearWebsite

Tech Stack

Tools & technologies
Amazon RedshiftAWSBigQueryDjangoGoKafkaPostgresPythonRabbitMQRedisSQLTerraform

About the role

Key responsibilities & impact
  • Own the data tier end-to-end
  • Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases
  • Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints
  • Catch N+1 patterns and missing select_related/prefetch_related in review
  • Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning)
  • Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge
  • Capacity planning as traffic and engineering throughput grow
  • Zero-downtime schema migrations and cutovers
  • Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs
  • Backups, PITR, failover testing, retention
  • Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake)
  • DMS task design and tuning, replication slot hygiene on the Postgres side
  • Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM
  • Parquet layout and partitioning, reliability of the Snowflake handoff
  • Automated checks that flag migrations likely to break downstream consumers
  • Drive observability across three complementary tools: pganalyze, CloudWatch, Honeycomb

Requirements

What you’ll need
  • Deep PostgreSQL - EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling)
  • Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) - predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries
  • Single-region multi-AZ design - practical understanding of what it does and doesn't protect against
  • Production CDC experience, ideally AWS DMS - comfortable with logical replication, slot hygiene, schema evolution, and Parquet-based data lakes feeding Snowflake (or BigQuery/Redshift)
  • Hands-on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high-cardinality tracing tool) - comfortable with OpenTelemetry and opinionated about what makes a trace useful
  • Real experience making AI coding and review tools useful for a team - writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs
  • OpenSearch at scale - sizing, sharding, JVM tuning, rolling upgrades, snapshots
  • Production Redis - persistence tradeoffs, cluster mode, hot keys, thundering herds
  • At least one production message broker (SQS, RabbitMQ, Kafka) - delivery semantics, idempotency, failure modes
  • Strong automation and IaC background - real code (Python, Go, or similar) and Terraform
  • Track record leading cross-team initiatives, writing design docs that hold up, influencing without authority
  • Comfortable in a high-growth environment where the right answer for 50 engineers isn't the right answer for 100
  • Pragmatic outlook during incidents - focused on preventing the next one

Benefits

Comp & perks
  • Some of the nicest and smartest teammates you’ll ever work with
  • Competitive salaries
  • Comprehensive healthcare benefits
  • Exciting and motivating equity
  • Flexible PTO
  • 401k
  • Parental Leave
  • Commuter Benefits (SF office employees)
  • WFH Stipend

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PostgreSQLAurora Serverless V2DjangoSQLAlchemyActiveRecordAWS DMSOpenSearchRedisTerraformPython
Soft Skills
leadershipinfluencing without authorityautomationpragmatic outlookcross-team initiativesdesign documentationhigh-growth adaptabilityincident prevention