Salary
💰 $160,000 - $200,000 per year
Tech Stack
AWSCloudETLGoogle Cloud PlatformGrafanaJavaPostgresPythonSQL
About the role
- Design, implement, and optimize ETL/ELT pipelines for large-scale PostgreSQL datasets (11TB+ production, 5TB staging)
- Build scalable ingestion workflows into ClickHouse Cloud using Iceberg tables on AWS S3 and AWS Glue
- Develop processes for anonymizing and preparing healthcare data in staging environments to support development and research without exposing PHI
- Implement robust validation and reconciliation checks to ensure data quality and HIPAA-compliant handling
- Develop and maintain schemas to support both OLTP (PostgreSQL) and OLAP (ClickHouse/Iceberg) workloads
- Optimize query performance for analytics while minimizing load on production databases
- Extend our data warehouse to enable ad-hoc analysis, BI tool integrations, and healthcare-specific reporting use cases
- Build tools and dashboards to monitor schema changes, query performance, and pipeline health across PostgreSQL, ClickHouse, and Glue/S3
- Implement alerting, logging, and performance tuning strategies for production and staging environments
- Collaborate with engineers and analysts to proactively identify bottlenecks and scalability improvements
- Integrate structured healthcare data flows between EHR systems, RCM platforms, and internal services
- Build APIs or connectors to surface analytical and operational data securely to downstream consumers
- Ensure interoperability across GCP (SQL) and AWS (ClickHouse, Glue, S3) platforms
- Manage and evolve our hybrid-cloud data infrastructure (GCP SQL + AWS Glue/ClickHouse)
- Enforce access management, encryption, and anonymization controls aligned with HIPAA and healthcare compliance standards
- Partner with security and compliance teams to implement best practices in sensitive data handling
- Work closely with analysts, scribe technology developers, and product engineers to capture data requirements
- Document schemas, pipelines, and workflows to ensure maintainability and cross-team understanding
- Mentor team members and advocate for data engineering best practices across the company
Requirements
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent experience
- Proven experience in data engineering at scale (multi-TB datasets, OLTP + OLAP systems)
- Strong SQL expertise with PostgreSQL and experience tuning queries for high-volume transactional databases
- Hands-on experience with Python, Java, and SQL for data processing and pipeline orchestration
- Familiarity with ClickHouse or other analytical databases, and data lake formats (Iceberg, Parquet, ORC)
- Experience with AWS Glue (ETL, catalog) and S3-based data lakes
- Understanding of cloud-native services in both Google Cloud (Cloud SQL) and AWS
- Knowledge of data anonymization and governance techniques for sensitive healthcare data (HIPAA familiarity a plus)
- Experience with monitoring/observability tools for data infrastructure (e.g., Grafana, dbt metrics, or custom solutions)
- Strong problem-solving and debugging skills; ability to balance technical rigor with business needs
- Effective communicator and collaborator across engineering, analytics, and product teams
- Offers Equity
📊 Resume Score
Upload your resume to see if it passes auto-rejection tools used by recruiters
Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
ETLELTPostgreSQLClickHouseAWS GluePythonJavaSQLdata anonymizationdata governance
Soft skills
problem-solvingdebuggingeffective communicationcollaborationmentoring
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in EngineeringBachelor’s degree in Information Systems