
Senior Database Reliability Engineer, Architect
Alex Staff Agency
full-time
Posted on:
Location Type: Remote
Location: Georgia
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- This position is open at a global product-led IT company specializing in infrastructure stability and security solutions. Their products are recognized as the industry standard in the Hosting and Enterprise segments, powering over 500,000 servers worldwide.
- In 2025, the company is evolving its data management strategy, shifting from traditional database administration to an Internal Database-as-a-Service (DBaaS) model. This role requires a visionary engineer to design resilient distributed systems, automate infrastructure through code, and transform databases into a reliable service for product teams. This is an ideal opportunity for those ready to handle petabytes of data and build high-scale platform solutions.
- **Key Challenges & Responsibilities:**
- - Designing and implementing a self-service platform (Terraform + Ansible) for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal, OpenNebula, K8s, Public Clouds).
- - Managing rapidly growing analytics clusters (12+ clusters, tens of terabytes), focusing on sharding, ReplicatedMergeTree, and building reliable S3 backup pipelines under high load.
- - Maintaining and scaling infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
- - Implementing SRE practices in data management: replacing manual incident response with automated self-healing mechanisms and defining SLO/SLIs.
- - Migrating legacy solutions to modern cloud patterns and implementing Kubernetes operators for stateful workloads.
- - Serving as a technical authority for product teams to optimize data schemas and SQL queries for high-load systems.
- **Tech Stack:**
- - **DB:** PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka.
- - **Data & Analytics:** Apache Airflow, Redash.
- - **Infrastructure:** Hybrid Cloud (3+ private DCs, OpenNebula, K8s, Bare Metal, AWS, GCP, Azure, DO).
- - **IaC & CI/CD:** Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit.
- - **Observability:** VictoriaMetrics, Grafana, Loki.
Requirements
- **Must have:**
- - 5+ years of PostgreSQL expertise: deep knowledge of MVCC, locking mechanics, expert-level Patroni/PgBouncer configuration, and experience with seamless major version upgrades under load.
- - ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and performance diagnostics at the data-part level.
- - Engineering mindset (SRE/DevOps): experience writing complex Terraform modules and Ansible roles; proficiency in Python or Go for automation is a major asset.
- - Hybrid environment experience: understanding the nuances of running DBs on Bare Metal vs. Kubernetes vs. Public Cloud, with the ability to optimize TCO and disk subsystem performance (NVMe, Network Storage).
- - Systems approach: understanding the full stack from network packets to business logic, including security standards (FIPS, Audit logs) and Disaster Recovery.
- **Nice to Have:**
- - Experience building an Internal Developer Platform (IDP).
- - Experience operating databases in Kubernetes via operators (CloudNativePG, Altinity Operator).
- - Background working with Cloud or Hosting providers on similar services.
Benefits
- - Fully remote work from any location worldwide and flexible working hours.
- - Opportunity to impact architectural decisions for services used by thousands of companies globally.
- - 24 days of vacation, 10 national holidays, and unlimited paid sick leave.
- - Compensation for private medical insurance.
- - Reimbursement for co-working spaces and gym/sports activities.
- - Dedicated budget for education, training, and conferences.
- - Reward program for innovative ideas that lead to company patents.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PostgreSQLClickHouseMongoDBRedisTerraformAnsiblePythonGoApache AirflowRedash
Soft Skills
engineering mindsetSREDevOpssystems approach