FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudGoGoogle Cloud PlatformGrafanaJavaJavaScriptNode.jsPrometheusPython
About the role
Key responsibilities & impact- Ensure the resilience and availability of Kohl’s systems and applications
- Collaborate closely with development teams
- Contribute to architectural designs
- Conduct risk assessments and design for failure
- Implement robust monitoring and failover mechanisms
- Drive error budget and Service Level Objective (SLO) adoption across products
- Drive incident response efforts, perform root cause analysis and implement preventative measures to enhance system reliability
- Establish consistent practices that elevate Kohl’s operational excellence through automation and process improvements
- Follow software lifecycle and drive reliability, observability, and efficiency across product teams within an assigned domain
- Identify repeated toil and find opportunities for automation and risk reduction
- On-call on a rotation to respond to production incidents and conduct blameless retros and root-cause analyses (RCAs) to drive a culture of continuous improvements
- Proactively identify failures before they cause outages using chaos engineering techniques such as edge cases, failure modes and design review
- Advise on capacity planning and provide continuous assessments on systems behavior and consumption
- Work with product managers to identify and prioritize work for reliability best practices (i.e., leveraging SLIs/SLOs/Error Budgets)
- Mentor and assist engineers on the team
Requirements
What you’ll need- Bachelor's Degree or equivalent in MIS, Computer Science or related field
- 4+ years of experience in software development
- Strong programming skills in one or more languages (Java, Python, Go or Node.js)
- In-depth knowledge of systems architecture, operating system internals and network fundamentals
- In-depth knowledge of application design patterns, event-driven architecture, database schemas, and testing strategies
- Experience with multi-region application troubleshooting and performance tuning
- Working experience with one cloud platform (GCP, AWS, or Azure)
- Working experience with monitoring techniques and tools (e.g., CloudWatch, Grafana, Prometheus, OpenTelemetry, Tracing)
Benefits
Comp & perks- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaPythonGoNode.jssystems architectureoperating system internalsnetwork fundamentalsapplication design patternsevent-driven architecturedatabase schemas
Soft Skills
collaborationmentoringincident responseroot cause analysisprocess improvementautomationcapacity planningcontinuous improvementcommunicationleadership
Certifications
Bachelor's Degree in MISBachelor's Degree in Computer Science
