Innodata Inc.

Senior Data Engineer – Real-Time & Distributed Systems, GCP

Innodata Inc.

full-time

Posted on:

Location Type: Remote

Location: New JerseyUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design, build, and optimize scalable data pipelines for batch and real-time processing
  • Develop and maintain event-driven architectures for high-throughput systems
  • Ensure data reliability, performance, and low-latency processing across distributed environments
  • Collaborate with data scientists and application teams to enable analytics and AI use cases
  • Implement best practices in performance tuning, monitoring, and cost optimization

Requirements

  • Advanced proficiency in Python for backend and large-scale data processing
  • Strong experience building and managing big data pipelines in production environments
  • Hands-on expertise with workflow orchestration tools such as Airflow or Google Cloud Composer
  • Proven experience in batch and streaming data processing using: Apache Spark Apache Beam (Dataflow)
  • Experience designing and operating event-driven systems using Pub/Sub
  • Strong understanding of distributed systems architecture and scalability patterns
  • Experience managing globally distributed, low-latency datasets
  • Hands-on experience with NoSQL databases and/or Google Cloud Spanner
  • Strong knowledge of system reliability, fault tolerance, and performance optimization
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Pythonbig data pipelinesbatch processingstreaming data processingApache SparkApache Beamevent-driven systemsNoSQL databasesGoogle Cloud Spannerperformance optimization