Whatnot

Software Engineer, Infrastructure

Whatnot

full-time

Posted on:

Location Type: Hybrid

Location: KrakówPoland

Visit company website

Explore more

AI Apply
Apply

About the role

  • build distributed systems, services, and frameworks that improve the reliability of the entire platform
  • focus on making reliability a built-in property of our systems as scale, traffic, and complexity continue to grow
  • design, build, and operate reliability-focused components, services, and frameworks
  • shape the standards and practices that guide how software is built and run across Whatnot
  • partner closely with product, platform, and infrastructure teams to embed reliability concerns into system design, development workflows, and runtime behavior
  • design and operate traffic control mechanisms, including circuit breakers, rate limiting, backpressure, and graceful degradation
  • build and evolve load testing frameworks that validate system behavior under sustained, burst, and peak event traffic patterns
  • build chaos and resilience testing frameworks to proactively surface failure modes and validate recovery behavior
  • define and implement SLOs, SLIs, and error budgets that guide engineering teams toward the right reliability tradeoffs
  • develop reliability tooling and services that improve incident detection, response, and automated mitigation
  • review service architectures and designs with a focus on failure modes, scalability limits, and operational safety
  • participate in incident response and drive post incident follow ups that reduce repeated failure patterns through systemic fixes

Requirements

  • 5+ years of experience as a software engineer working on large scale distributed systems
  • Strong fundamentals in designing, building, and operating shared production services and frameworks
  • Experience with traffic control mechanisms such as circuit breakers and rate limiting
  • Experience building or operating load testing and chaos testing frameworks
  • Hands on experience with observability, monitoring, and debugging production systems
  • Experience working with SLOs, error budgets, and incident response processes
  • Comfortable in cloud native environments such as AWS or GCP with Kubernetes and infrastructure as code
  • Strong collaborator with clear written and verbal communication skills
  • Bonus: experience with high traffic, real time, or event driven systems
Benefits
  • flexibility to work from home or from one of our global office hubs
  • in-person time for planning, problem-solving, and connection
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
distributed systemsreliability engineeringtraffic control mechanismsload testing frameworkschaos testing frameworksobservabilitymonitoringdebuggingSLOserror budgets
Soft Skills
collaborationcommunication