Zeta

Manager, Data Reliability Engineering

Zeta

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

SeniorLead

Tech Stack

AirflowApacheAWSAzureCloudGoogle Cloud PlatformJenkinsKafkaPostgresPythonSparkTerraform

About the role

  • Leverage deep expertise in database management and optimization to ensure high performance and reliability of our data systems.
  • Identify bottlenecks and performance issues within data pipelines, optimize query performance, data access, and overall data processing.
  • Design, deploy, and manage complex data systems in cloud environments (e.g., AWS, Azure, GCP) using tools such as Terraform and adhering to AWS well architected Framework.
  • Develop and implement complex automation solutions using tools such as Jenkins and scripts to streamline data operations and enhance efficiency.
  • Architect and manage enterprise HA and DR solutions to ensure business continuity and data availability.
  • Expertly analyze and optimize database performance, identifying and resolving bottlenecks.
  • Ensure adherence to cloud security best practices and compliance standards, protecting sensitive data and systems.
  • Manage complex incidents, troubleshoot issues, and implement effective solutions to maintain data integrity and system reliability.
  • Demonstrate leadership skills, mentor junior team members, and foster a collaborative and communicative team environment.

Requirements

  • Bachelor’s degree in Computer Science or equivalent with 8 - 11 years of hands-on experience in database management and optimization on various relational databases with PostgreSQL as primary skillset.
  • Experience in Cloud Databases administration - Configuration, Backup/Restore, replication which are in the order of 10s of TB in size.
  • Expertise in identifying and resolving performance issues within data pipelines.
  • Experience in developing and implementing complex automation solutions using tools like Jenkins, Terraform & Python.
  • Experience in architecting and managing high availability and disaster recovery solutions of Databases in the cloud across regions.
  • Expert in performance monitoring and optimization of database systems.
  • In-depth knowledge of cloud security best practices and compliance standards.
  • Extensive experience in managing complex incidents and troubleshooting.
  • Leadership skills with the ability to mentor and guide junior team members.
  • Experience with modern data processing tools and frameworks (e.g., Apache Kafka, Apache Spark, Airflow, Debezium etc.) is a plus.