Sutherland

GCP Data Engineer

Sutherland

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7, FHIR)
  • Build a robust Bronze layer as the single source of truth, storing raw, untransformed data in Cloud Storage
  • Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
  • Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs, databases, APIs)
  • Apply schema inference and basic data type adjustments while preserving raw data lineage
  • Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
  • Establish data landing zone controls including audit logging, versioning, and immutability patterns
  • Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
  • Implement data catalog and metadata management for raw data assets

Requirements

  • 5+ years of experience with GCP services
  • Strong expertise in Apache Kafka, Kafka Streams, and event-driven architectures
  • Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
  • Experience with healthcare data standards (HL7, FHIR) and handling semi-structured data
  • Hands-on experience with streaming frameworks (Apache Beam, Dataflow) for near-real-time ingestion
  • Knowledge of file formats and compression (JSON, Avro, Parquet) for raw data storage
  • Understanding of CDC patterns, incremental loading, and data versioning strategies
  • Experience with Cloud Storage lifecycle management and cost optimization
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
real-time data ingestionApache KafkaKafka StreamsPythonJavaApache BeamDataflowDataprocschema inferencedata versioning