
Software Engineer, Infrastructure
Whatnot
full-time
Posted on:
Location Type: Hybrid
Location: Kraków • Poland
Visit company websiteExplore more
About the role
- build distributed systems, services, and frameworks that improve the reliability of the entire platform
- focus on making reliability a built-in property of our systems as scale, traffic, and complexity continue to grow
- design, build, and operate reliability-focused components, services, and frameworks
- shape the standards and practices that guide how software is built and run across Whatnot
- partner closely with product, platform, and infrastructure teams to embed reliability concerns into system design, development workflows, and runtime behavior
- design and operate traffic control mechanisms, including circuit breakers, rate limiting, backpressure, and graceful degradation
- build and evolve load testing frameworks that validate system behavior under sustained, burst, and peak event traffic patterns
- build chaos and resilience testing frameworks to proactively surface failure modes and validate recovery behavior
- define and implement SLOs, SLIs, and error budgets that guide engineering teams toward the right reliability tradeoffs
- develop reliability tooling and services that improve incident detection, response, and automated mitigation
- review service architectures and designs with a focus on failure modes, scalability limits, and operational safety
- participate in incident response and drive post incident follow ups that reduce repeated failure patterns through systemic fixes
Requirements
- 5+ years of experience as a software engineer working on large scale distributed systems
- Strong fundamentals in designing, building, and operating shared production services and frameworks
- Experience with traffic control mechanisms such as circuit breakers and rate limiting
- Experience building or operating load testing and chaos testing frameworks
- Hands on experience with observability, monitoring, and debugging production systems
- Experience working with SLOs, error budgets, and incident response processes
- Comfortable in cloud native environments such as AWS or GCP with Kubernetes and infrastructure as code
- Strong collaborator with clear written and verbal communication skills
- Bonus: experience with high traffic, real time, or event driven systems
Benefits
- flexibility to work from home or from one of our global office hubs
- in-person time for planning, problem-solving, and connection
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
distributed systemsreliability engineeringtraffic control mechanismsload testing frameworkschaos testing frameworksobservabilitymonitoringdebuggingSLOserror budgets
Soft Skills
collaborationcommunication