Software Engineer, Infrastructure

Whatnot

full-time

Posted on: 1/20/2026

Location Type: Hybrid

Location: Kraków • Poland

Visit company website

Explore more

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

AWS Cloud Distributed Systems Google Cloud Platform Kubernetes

About the role

build distributed systems, services, and frameworks that improve the reliability of the entire platform
focus on making reliability a built-in property of our systems as scale, traffic, and complexity continue to grow
design, build, and operate reliability-focused components, services, and frameworks
shape the standards and practices that guide how software is built and run across Whatnot
partner closely with product, platform, and infrastructure teams to embed reliability concerns into system design, development workflows, and runtime behavior
design and operate traffic control mechanisms, including circuit breakers, rate limiting, backpressure, and graceful degradation
build and evolve load testing frameworks that validate system behavior under sustained, burst, and peak event traffic patterns
build chaos and resilience testing frameworks to proactively surface failure modes and validate recovery behavior
define and implement SLOs, SLIs, and error budgets that guide engineering teams toward the right reliability tradeoffs
develop reliability tooling and services that improve incident detection, response, and automated mitigation
review service architectures and designs with a focus on failure modes, scalability limits, and operational safety
participate in incident response and drive post incident follow ups that reduce repeated failure patterns through systemic fixes

Requirements

5+ years of experience as a software engineer working on large scale distributed systems
Strong fundamentals in designing, building, and operating shared production services and frameworks
Experience with traffic control mechanisms such as circuit breakers and rate limiting
Experience building or operating load testing and chaos testing frameworks
Hands on experience with observability, monitoring, and debugging production systems
Experience working with SLOs, error budgets, and incident response processes
Comfortable in cloud native environments such as AWS or GCP with Kubernetes and infrastructure as code
Strong collaborator with clear written and verbal communication skills
Bonus: experience with high traffic, real time, or event driven systems

Benefits

flexibility to work from home or from one of our global office hubs
in-person time for planning, problem-solving, and connection

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

distributed systemsreliability engineeringtraffic control mechanismsload testing frameworkschaos testing frameworksobservabilitymonitoringdebuggingSLOserror budgets

Soft Skills

collaborationcommunication