Zocdoc

Staff Infrastructure Engineer, Data

Zocdoc

full-time

Posted on:

Location Type: Remote

Location: Remote • New York • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $179,444 - $270,750 per year

Job Level

Lead

Tech Stack

AirflowAmazon RedshiftApacheAWSBigQueryHadoopKafkaSparkSQLTerraformUnity

About the role

  • Own platform reliability & operations for data infrastructure: capacity planning, incident response/on‑call, disaster recovery, and performance tuning across compute and storage (e.g., Databricks, Snowflake, Glue/Athena, Kinesis/Kafka).
  • Architect the lake / lakehouse layers (Delta Lake / Apache Iceberg / Hudi) with clear data contracts, schema evolution, compaction, and retention; build the controls that make it safe and fast.
  • Harden platform surfaces (networking, IAM, encryption, key management, VPC endpoints/PrivateLink, Lake Formation/Unity Catalog) for PHI/PII, auditing, and least‑privilege by default.
  • Establish observability for data systems (Datadog/CloudWatch): golden signals, lineage, SLIs/SLOs, and cost telemetry/FinOps.
  • Infrastructure‑as‑Code everything (Terraform/CDK) including EMR/EKS clusters, Snowflake roles/warehouses, secrets, and CI/CD for data platform changes.
  • Optimize warehouse usage (Snowflake preferred; BigQuery/Redshift welcome): warehouse sizing/queuing, clustering, pruning, caching, RBAC, cost controls.
  • Partner with data engineering & security to set platform standards (orchestration like Dagster/Airflow; governance via Unity Catalog/Collibra/Lake Formation; quality and metadata services).

Requirements

  • 8+ years in infrastructure/SRE/platform engineering (or hybrid data platform) with deep AWS expertise (networking, IAM, CDK, S3, EKS, EMR, Glue, Lambda; multi‑account patterns).
  • Expertise running distributed data processing at scale with Spark (Databricks or EMR/EKS), plus working knowledge of Hadoop/Hive/Presto/Trino.
  • Strong SQL fundamentals and data modeling; Snowflake (or BigQuery/Redshift) performance and cost optimization.
  • Proven leadership establishing SLOs, runbooks, incident response, and reliability tooling for data platforms.
  • Solid IaC (Terraform), CI/CD, and security‑by‑default mindset for PHI/PII.
Benefits
  • Flexible, hybrid work environment at our convenient Soho location
  • Unlimited Vacation
  • 100% paid employee health benefit options (including medical, dental, and vision)
  • Commuter Benefits
  • 401(k) with employer funded match
  • Corporate wellness programs with Headspace and Peloton
  • Sabbatical leave (for employees with 5+ years of service)
  • Competitive paid parental leave and fertility/family planning reimbursement
  • Cell phone reimbursement
  • Catered lunch everyday along with beverages and snacks
  • Employee Resource Groups and ZocClubs to promote shared community and belonging
  • Great Place to Work Certified

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
capacity planningincident responsedisaster recoveryperformance tuningdata modelingSQLInfrastructure-as-CodeCI/CDdata processingsecurity-by-default
Soft skills
leadershipcollaborationcommunicationproblem-solvingorganizational skills
Voltage Park

Infrastructure Operations Engineer

Voltage Park
Senior · Leadfull-time$140k–$200k / year🇺🇸 United States
Posted: 2 days agoSource: jobs.ashbyhq.com
AnsibleAWSGoKubernetesLinuxNFSPrometheusPythonTerraform
Underdog Fantasy

Staff Infrastructure Engineer – Data Platform

Underdog Fantasy
Leadfull-time$200k–$290k / year🇺🇸 United States
Posted: 2 days agoSource: boards.greenhouse.io
AWSCloudDistributed SystemsGoGoogle Cloud PlatformKafkaKubernetesPostgresTerraform
MeshyAI

Data Infrastructure Engineer

MeshyAI
Mid · Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 3 days agoSource: boards.greenhouse.io
AirflowAWSAzureCloudDistributed SystemsETLGoogle Cloud PlatformJavaPythonRayScalaSpark+1 more
Leidos

Senior AI Infrastructure Engineer

Leidos
Seniorfull-time$126k–$228k / year🇺🇸 United States
Posted: 5 days agoSource: leidos.wd5.myworkdayjobs.com
AnsibleAWSAzureCloudCyber SecurityDockerGoogle Cloud PlatformKubernetesPythonTerraform