FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Infrastructure – Reliability Engineer
KWISenior Engineer managing Linux/UNIX systems and VMware infrastructure at KWI for retail clients. Driving operational excellence through automation and incident management.
Tech Stack
Tools & technologiesAnsibleDNSDockerJavaLinuxMySQLPythonTCP/IPTerraformUnixVMware
About the role
Key responsibilities & impact- Support and operate our Linux/UNIX systems, VMware infrastructure, CI/CD pipelines, MySQL databases, and containerized workloads that serve our retail clients 24×7.
- Own incidents end-to-end: triage alerts, drive root-cause analysis across the application, database, and network layers, and write the post-incident docs that stop recurrence.
- Tune and operate MySQL at production scale: query analysis, replication topology, backup and recovery, and schema changes against live workloads.
- Containerize and template services using Docker and infrastructure-as-code patterns to make deployments repeatable, declarative, and boring.
- Improve observability across the fleet — metrics, logs, traces, and dashboards — so problems are seen before customers feel them.
- Use modern AI-augmented engineering tools (Claude Code, MCP-based workflows, agentic automation) as a daily multiplier — to operate faster and extend what one engineer can deliver.
- Document and mentor. Runbooks, design docs, and onboarding material aren't an afterthought here — they're how the team scales.
Requirements
What you’ll need- 5+ years operating production Linux/UNIX (RHEL, CentOS/Rocky, Debian/Ubuntu) at meaningful scale.
- Strong MySQL operational experience — replication, performance tuning, backups, recovery, and schema migrations.
- Hands-on VMware/vSphere experience in production environments.
- Java application-tier troubleshooting experience — comfortable reading thread dumps, GC logs, and heap behavior.
- Solid DevOps fundamentals: Git, CI/CD pipelines, Ansible (or similar configuration management), Terraform (or similar IaC), and Docker.
- Networking literacy: TCP/IP, DNS, TLS, HTTP/S, load balancing, basic firewalling. You can read a tcpdump and a cert chain.
- Comfortable scripting in Bash. Python is not required, but you should have a working understanding of programming fundamentals and be able to read, modify, and write straightforward code.
- Strong troubleshooting instincts and the temperament lead under pressure.
- Real day-to-day experience using AI-augmented engineering tools (Claude, Cursor, Copilot, MCP servers, agentic workflows) — not just demos.
- Experience with Datadog or comparable observability platforms.
Benefits
Comp & perks- Full Medical, Dental and Vision
- Annual bonus eligible
- Free gym in the building
- Generous PTO policy
- Summer Fridays....all year round
- Tuition Reimbursement
- Discount from building café
- 401(K) with a 50% company match (up to 6% of employee contribution)
- Employee Referral Program
- (1) Volunteer day each year
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LinuxUNIXMySQLVMwareDockerGitCI/CDAnsibleTerraformBash
Soft Skills
troubleshootingmentoringdocumentationincident managementroot-cause analysispressure managementobservability improvement