Alex Staff Agency

Senior Database Reliability Engineer, Architect

Alex Staff Agency

full-time

Posted on:

Location Type: Remote

Location: Georgia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • This position is open at a global product-led IT company specializing in infrastructure stability and security solutions. Their products are recognized as the industry standard in the Hosting and Enterprise segments, powering over 500,000 servers worldwide.
  • In 2025, the company is evolving its data management strategy, shifting from traditional database administration to an Internal Database-as-a-Service (DBaaS) model. This role requires a visionary engineer to design resilient distributed systems, automate infrastructure through code, and transform databases into a reliable service for product teams. This is an ideal opportunity for those ready to handle petabytes of data and build high-scale platform solutions.
  • **Key Challenges & Responsibilities:**
  • - Designing and implementing a self-service platform (Terraform + Ansible) for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal, OpenNebula, K8s, Public Clouds).
  • - Managing rapidly growing analytics clusters (12+ clusters, tens of terabytes), focusing on sharding, ReplicatedMergeTree, and building reliable S3 backup pipelines under high load.
  • - Maintaining and scaling infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
  • - Implementing SRE practices in data management: replacing manual incident response with automated self-healing mechanisms and defining SLO/SLIs.
  • - Migrating legacy solutions to modern cloud patterns and implementing Kubernetes operators for stateful workloads.
  • - Serving as a technical authority for product teams to optimize data schemas and SQL queries for high-load systems.
  • **Tech Stack:**
  • - **DB:** PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka.
  • - **Data & Analytics:** Apache Airflow, Redash.
  • - **Infrastructure:** Hybrid Cloud (3+ private DCs, OpenNebula, K8s, Bare Metal, AWS, GCP, Azure, DO).
  • - **IaC & CI/CD:** Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit.
  • - **Observability:** VictoriaMetrics, Grafana, Loki.

Requirements

  • **Must have:**
  • - 5+ years of PostgreSQL expertise: deep knowledge of MVCC, locking mechanics, expert-level Patroni/PgBouncer configuration, and experience with seamless major version upgrades under load.
  • - ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and performance diagnostics at the data-part level.
  • - Engineering mindset (SRE/DevOps): experience writing complex Terraform modules and Ansible roles; proficiency in Python or Go for automation is a major asset.
  • - Hybrid environment experience: understanding the nuances of running DBs on Bare Metal vs. Kubernetes vs. Public Cloud, with the ability to optimize TCO and disk subsystem performance (NVMe, Network Storage).
  • - Systems approach: understanding the full stack from network packets to business logic, including security standards (FIPS, Audit logs) and Disaster Recovery.
  • **Nice to Have:**
  • - Experience building an Internal Developer Platform (IDP).
  • - Experience operating databases in Kubernetes via operators (CloudNativePG, Altinity Operator).
  • - Background working with Cloud or Hosting providers on similar services.
Benefits
  • - Fully remote work from any location worldwide and flexible working hours.
  • - Opportunity to impact architectural decisions for services used by thousands of companies globally.
  • - 24 days of vacation, 10 national holidays, and unlimited paid sick leave.
  • - Compensation for private medical insurance.
  • - Reimbursement for co-working spaces and gym/sports activities.
  • - Dedicated budget for education, training, and conferences.
  • - Reward program for innovative ideas that lead to company patents.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PostgreSQLClickHouseMongoDBRedisTerraformAnsiblePythonGoApache AirflowRedash
Soft Skills
engineering mindsetSREDevOpssystems approach