Tech Stack
AnsibleAWSAzureCloudGoGoogle Cloud PlatformGrafanaKubernetesOpen SourcePostgresPrometheusPythonTerraform
About the role
- Lead, mentor, and grow a distributed team of PostgreSQL DBaaS support engineers and SREs; define career paths, set goals, and provide regular feedback
- Oversee 24x7 support and incident management for the PostgreSQL DBaaS platform; manage escalations from enterprise customers
- Drive SRE best practices: SLAs, SLOs, SLIs, budgets, incident retrospectives, and postmortems
- Ensure compliance with SOC 2, HIPAA, GDPR, and other regulatory frameworks
- Collaborate with Product and Engineering teams to influence roadmaps and drive platform improvements
- Own the reliability, scalability, and performance of PostgreSQL clusters in production
- Drive automation of provisioning, monitoring, backup/recovery, patching, and upgrades
- Partner with architecture teams to define best practices for schema design, indexing, performance tuning, and replication strategies
- Guide incident response, root cause analysis, and long-term remediation
- Develop dashboards, runbooks, and playbooks to enhance operational visibility and reduce mean time to recovery (MTTR)
- Track ticket metrics and ensure service level agreements and best practices are met
Requirements
- 7+ years of experience in PostgreSQL administration, support, or engineering
- At least 3 years in leadership or management
- Proven track record managing DBaaS platforms or large-scale PostgreSQL deployments
- Deep knowledge of high availability, replication, partitioning, and performance tuning in PostgreSQL
- Strong understanding of SRE principles, including monitoring, alerting, incident response, and service level objectives
- Experience with Kubernetes, container orchestration, and cloud providers (AWS, GCP, Azure)
- Familiarity with Terraform, Ansible, or similar automation tools
- Strong communication and stakeholder management skills
- Experience managing escalations from enterprise customers and meeting ticket metrics and SLAs
- Experience with SOC compliance is a plus
- (Preferred) Prior experience managing 24x7 global support teams
- (Preferred) Knowledge of multi-tenant DBaaS architectures
- (Preferred) Experience with security, compliance, and audit frameworks (SOC 2, HIPAA, FedRAMP)
- (Preferred) Familiarity with observability stacks (Prometheus, Grafana, ELK, Datadog)
- (Preferred) Programming/scripting proficiency in Python, Go, or Bash