Salary
💰 €85,000 - €115,000 per year
Tech Stack
AWSCloudDistributed SystemsKotlinKubernetesMySQLPythonReactVue.js
About the role
- Ensure the reliability, availability, and operational efficiency of Affirm’s distributed systems.
- Work on incident response, observability, automation, and system resilience in cloud-native environments.
- Own and deliver quarterly goals for your team, leading engineers through ambiguity to solve open-ended problems.
- Support product development lifecycle by collaborating with product management, design, and analytics.
- Proactively identify project, process, technology or business issues and lead in solving them.
- Support operations and availability by creating and monitoring metrics, escalating when needed, and supporting on-call efforts.
- Foster a culture of quality and ownership by setting or improving code review and design standards and advocating beyond your team.
- Help develop talent on your team by providing feedback and guidance and leading by example.
- Participate in on-call rotation as a requirement.
Requirements
- You have a total of 5+ years of experience as a software engineer.
- Experienced in designing, developing and launching backend systems at scale technologies like Python, Kotlin, AWS, MySQL.
- Leverage Kubernetes expertise for cloud compute orchestration and troubleshoot.
- Comprehensive traffic management to ensure scalable, reliable, and high-performance platform operations (specially for AWS environment).
- Experience shipping web apps using declarative UI frameworks like React or Vue.
- Ensure high availability (HA) and resilience of critical services through runbooks, incident response strategies, and post-mortem analysis.
- Partner with Observability and Reliability teams to proactively detect and mitigate potential outages.
- Experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design.
- Write high quality code that is easily understood and used by others.
- Automate operational tasks, deployments, failover processes, and scaling strategies to reduce manual intervention.
- Proficient at making significant changes in a large code base and have developed tools and practices that enable safe changes.
- Partner closely with Storage & Replication, Cloud, CI/CD, and Security teams to ensure high operational standards.
- Strong verbal and written communication skills that support effective collaboration with our global engineering team.