Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Zello

Senior Site Reliability Engineer

Zello

Senior Site Reliability Engineer at Zello responsible for maintaining data tier reliability using MySQL, MongoDB, and more.

Posted 5/13/2026full-timeAustin • Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureCassandraCloudDockerElasticSearchGoGoogle Cloud PlatformKubernetesLinuxMongoDBMySQLPrometheusPython

About the role

Key responsibilities & impact
  • Design, deploy, and operate highly available MySQL and MongoDB clusters across our cloud environments
  • Tune query performance, schema, and index strategy in partnership with application engineers and push fixes upstream into the application when that's the right answer
  • Extend our observability stack — Prometheus, Loki, and Tempo
  • Participate in the Platform on-call rotation, lead incident response for data-tier issues, and write postmortems that drive durable change
  • Improve disaster recovery, security posture, and compliance for our database footprint — encryption, access control, audit logging, backup integrity
  • Evaluate and operate ScyllaDB/Cassandra and Elasticsearch where they fit the workload
  • Write the automation, tooling, and operators that take repetitive work off the team's plate
  • Use AI to compress incident response and root-cause analysis

Requirements

What you’ll need
  • 7+ years in SRE, DevOps, platform, infrastructure, or database reliability roles, with at least 3 years owning production databases
  • BSc in Computer Science or equivalent practical experience
  • You've operated highly available MySQL and MongoDB in production at scale
  • replication, sharding, backups, point-in-time recovery, and failover drills you've actually run, not just designed on paper
  • You diagnose database performance end-to-end; query plan, indexes, locking, OS, storage, network
  • You've shipped meaningful work on at least two of bare metal Linux, containerized workloads (Docker, Kubernetes, or similar), and a major cloud (GCP preferred; AWS or Azure equivalent is fine)
  • You instrument what you build. You've used Prometheus, OpenTelemetry, or comparable systems to close real incidents, and you've written the dashboard the next on-call engineer will actually open.
  • You write code that runs in production: Python, Go, Bash, or similar for automation, tooling, or operators. You don't hand off scripting to someone else.
  • You communicate clearly under pressure and after the fact. Your postmortems are blameless, specific, and lead to changes that stick
  • You bring an opinion on managed vs. self-managed databases, and can defend the trade-off based on availability, cost, and operational burden.
  • ScyllaDB/Cassandra or Elasticsearch experience is a plus
  • You've used AI tooling: copilots, agents, or custom automation to expedite incident response, root-cause analysis, or developer workflows.

Benefits

Comp & perks
  • competitive pay
  • equity with significant upside
  • sabbatical after every five years of service
  • flexible schedules and time off
  • free snacks in our break room

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
MySQLMongoDBScyllaDBCassandraElasticsearchPythonGoBashDockerKubernetes
Soft Skills
communication under pressureincident responsepostmortem writingblameless culturediagnostic skills
Certifications
BSc in Computer Science