Salary
💰 $209,800 - $246,800 per year
Tech Stack
AirflowAWSCloudDynamoDBEC2GoKafkaKubernetesMySQLPostgresPythonRedisSparkTerraform
About the role
- Role leads Platform Infrastructure, builds and operates the core systems that power Abnormal's AI-driven detection and prevention at cloud scale.
- Lead foundational efforts across multiple areas of Platform Infrastructure; guide a high-performing team; shape the roadmap for a self-service infrastructure platform.
- Team mission: Build and evolve the core infrastructure—compute, orchestration, and data platform—that powers Abnormal’s AI/ML products at scale.
- What you will do: Shape core areas of Platform Infrastructure (compute (EC2/EKS, autoscaling, container runtime), orchestration (Kubernetes, workload APIs, multi-cluster, policy/quotas), data platform (streaming, batch, storage, data tooling)); design architecture and roadmap; partner with product/ML workflows; raise operational excellence; act as technical lead; champion AI-native software development; own cost-conscious engineering; instill platform product practices.
- Must haves: proven data-intensive backend systems experience; 5+ years as senior/staff; change agent; depth in two areas; hands-on with stack; strong IaC, observability, SRE fundamentals.
- Nice to haves: multi-tenant or regulated platforms experience; feature stores; cross-org migrations.
- How we work: product mindset; automation first; measured outcomes.
Requirements
- Proven experience building and scaling data-intensive, distributed backend systems in high-growth environments.
- 5+ years as a Senior/Staff engineer building platforms, tools, or infrastructure that materially increase engineering velocity and reliability.
- A strong track record as a change agent—reshaping infra strategy and shipping impactful, self-service platform offerings in startup settings.
- Depth in at least two of the following three areas: Compute, Orchestration, Data Platform.
- Hands-on with our stack: Python, Golang, Terraform/Terragrunt, PostgreSQL, Kafka, Redis, OpenSearch, AWS, Kubernetes. Strong IaC, observability, and SRE fundamentals.