Maintain and operate Apache Kafka infrastructure in a high-scale, multi-cloud environment, ensuring high availability, reliability, and performance
Manage and evolve tools, libraries, and frameworks that support Kafka producers and consumers, improving developer experience and promoting consistency across teams
Automate operational tasks such as provisioning, scaling, alerting, and recovery to reduce toil and minimize operational overhead
Enhance system reliability through robust monitoring, failover mechanisms, self-healing workflows, and capacity planning
Implement self-service capabilities and platform abstractions to reduce time-to-market for teams integrating with Kafka
Collaborate with internal teams to support event-driven architecture design and troubleshoot platform-related issues
Contribute to infrastructure as code using Terraform and manage Kubernetes-based deployments for platform components
Requirements
Atleast 2+ years of relevant experience in building applications from scratch, with proficiency in an object oriented or functional programming language (e.g. Java, Golang, Clojure, Python, Ruby etc)
Basic knowledge of Apache Kafka and its ecosystem (e.g., Kafka Connect, Schema Registry)
Good understanding of message brokers like RabbitMQ (RMQ)
Hands-on with Terraform for infrastructure provisioning and automation
Working knowledge of Kubernetes (K8s) for deploying and managing containerized workloads
Familiarity with distributed systems concepts and multi-cloud architectures
Comfortable with observability tools (e.g., Prometheus, Grafana) and CI/CD pipelines
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.