
Site Reliability Engineer, Core Streaming
Yelp
full-time
Posted on:
Location Type: Remote
Location: California • Illinois • United States
Visit company websiteExplore more
Salary
💰 $141,000 - $216,000 per year
About the role
- Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments.
- Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing.
- Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services.
- Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery.
- Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses.
- Participate in on-call rotations.
Requirements
- Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments, including upgrades and migrations between platforms or versions.
- In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances.
- Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation.
- Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters.
- Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink.
- Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related).
- Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.
- A Bachelor’s Degree or an equivalent work experience is required.
Benefits
- There may be flexibility with the range included in this posting should a candidate be leveled higher or lower than the posted range.
- This opportunity has the option to be fully remote in all locations across the US.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Kafkaevent streamingLinuxJavaPythonApache Flinkconfiguration managementInfrastructure as Code (IaC)scriptingdata pipeline reliability
Soft Skills
problem-solvingeagerness to learninitiativeadvocacy for best practices