Tech Stack
AWSAzureCloudDockerGoogle Cloud PlatformGrafanaKafkaKotlinKubernetesPrometheusPythonSparkTerraform
About the role
- Own the real‑time data streaming stack and design, build, and operate Kafka-based pipelines (Confluent Cloud/self‑managed)
- Architect topics, partitions, retention, and compaction; build ingestion with Kafka Connect/Debezium; implement stream processing (Kafka Streams/ksqlDB/Flink/Spark)
- Enforce schemas (Schema Registry: Avro/Protobuf), validation, PII handling, GDPR/CCPA compliance, and monitoring & alerting for lag, errors, and anomalies
- Ensure high availability, disaster recovery, and security (ACLs/RBAC, encryption); manage CI/CD and IaC (Terraform/Helm) and perform cost/throughput tuning
- Collaborate with Data/Analytics/Marketing/Product teams to define events, SLAs, and interfaces; provide best‑practice guidance and enablement
- Continuously reduce latency/cost, improve reliability, and evaluate new streaming tools/patterns
- Measure and report on consumer lag, uptime/MTTR, throughput & latency, schema evolution impact, data quality error rate, and cost per GB/event
Requirements
- 6+ years of experience as a Data Engineer or Data Integration Engineer
- 3+ years of hands-on production experience with Kafka (Confluent Platform/Cloud), including Kafka Connect, Schema Registry, and Kafka Streams/ksqlDB or alternatives (Flink/Spark Structured Streaming)
- Proficiency in Python, with experience in building services, tooling, and test frameworks
- Strong understanding of event modeling, idempotency, DLQs, backpressure handling, and data formats (Avro, Protobuf, JSON)
- Excellent communication skills (English C1 level); proven experience in agency, consulting, or client-facing environments
- 2+ years working with AWS, Azure, or GCP (preferred)
- Strong knowledge of Docker and Kubernetes; CI/CD pipelines; Git (preferred)
- Proficiency with Kubernetes and Docker (preferred)
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog) and incident response processes (preferred)
- Familiarity with GitLab for pipeline automation (preferred)
- Kotlin for Kafka Streams development (≈20% of workload) (preferred)
- Working knowledge of Jupyter Notebook for exploratory analysis (preferred)
- Experience with Terraform or CloudFormation for infrastructure setup and management (preferred)
- Strong understanding of relational databases and big data management practices (preferred)
- Hands-on experience with mParticle (preferred)
- Familiarity with Tealium (a strong plus) (preferred)