GCP Data Engineer

Sutherland

full-time

Posted on: 1/29/2026

Location Type: Remote

Location: India

✨ AI Apply

About the role

Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7, FHIR)
Build a robust Bronze layer as the single source of truth, storing raw, untransformed data in Cloud Storage
Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs, databases, APIs)
Apply schema inference and basic data type adjustments while preserving raw data lineage
Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
Establish data landing zone controls including audit logging, versioning, and immutability patterns
Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
Implement data catalog and metadata management for raw data assets

5+ years of experience with GCP services
Strong expertise in Apache Kafka, Kafka Streams, and event-driven architectures
Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
Experience with healthcare data standards (HL7, FHIR) and handling semi-structured data
Hands-on experience with streaming frameworks (Apache Beam, Dataflow) for near-real-time ingestion
Knowledge of file formats and compression (JSON, Avro, Parquet) for raw data storage
Understanding of CDC patterns, incremental loading, and data versioning strategies
Experience with Cloud Storage lifecycle management and cost optimization

Benefits

📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

real-time data ingestionApache KafkaKafka StreamsPythonJavaApache BeamDataflowDataprocschema inferencedata versioning