
Senior Infrastructure Engineer – Data Streaming, Kafka, Redis
SentinelOne
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteSalary
💰 $128,000 - $176,000 per year
Job Level
Senior
Tech Stack
AWSCassandraCloudDistributed SystemsFluxGoGoogle Cloud PlatformJenkinsKafkaKubernetesPythonRedisTerraform
About the role
- Operate and maintain distributed data services—including Kafka and Redis—running at massive scale across Kubernetes clusters and multi-cloud environments.
- Unlock complete cloud portability for SentinelOne’s services by building a highly automated, self-service infrastructure that can run seamlessly across AWS, GCP, and air-gapped on-prem environments.
- Manage data infrastructure supporting 5+ PB/day ingestion, ensuring low-latency, high-throughput, and cost-effective operation at global scale.
- Consolidate and optimize multi-tenant Kafka clusters to reduce cost, improve resilience, and streamline operations.
- Drive Redis and Kafka lifecycle automation using GitOps principles (ArgoCD, Terraform), reducing operational toil and minimizing pager fatigue.
- Define and implement standards for observability, HA, backup, and DR of stateful workloads in Kubernetes.
- Partner with FinOps and engineering stakeholders to continuously optimize performance, cost, and operational overhead across data platform components.
- Own the end-to-end platform experience for mission-critical open-source systems such as Kafka, Redis, and Cassandra, serving hundreds of product teams.
Requirements
- 5+ years of experience in infrastructure/platform engineering, with a proven track record of operating stateful distributed systems at scale.
- Deep hands-on experience with Kafka and Redis running in Kubernetes, including performance tuning, scaling, partitioning, persistence, and operator-based lifecycle management.
- Strong understanding of Kubernetes internals and best practices for managing both stateless and stateful workloads in production environments.
- Hands-on with GitOps and IAC: ArgoCD and/or Flux
- Terraform/Terragrunt is desired
- CI/CD: Github Action / Jenkins
- Python/Golang knowledge (ability to automate day-to-day operations, read and understand the comments, and provide improvement suggestions)
- Understanding of SRE principles including SLA/SLO and incident response (role includes oncall)
- US Citizenship and the ability to work in a government-regulated environment.
Benefits
- Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid Company Holidays
- Paid Sick Time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events, including regular happy hours and team-building events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
KafkaRedisKubernetesTerraformGitOpsCI/CDPythonGolangCassandraIAC
Soft skills
performance optimizationcollaborationproblem-solvingcommunicationincident response