Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
KWI

Senior Infrastructure – Reliability Engineer

KWI

Senior Engineer managing Linux/UNIX systems and VMware infrastructure at KWI for retail clients. Driving operational excellence through automation and incident management.

Posted 5/29/2026full-timeMelville • New York • 🇺🇸 United StatesSenior💰 $180,000 per yearWebsite

Tech Stack

Tools & technologies
AnsibleDNSDockerJavaLinuxMySQLPythonTCP/IPTerraformUnixVMware

About the role

Key responsibilities & impact
  • Support and operate our Linux/UNIX systems, VMware infrastructure, CI/CD pipelines, MySQL databases, and containerized workloads that serve our retail clients 24×7.
  • Own incidents end-to-end: triage alerts, drive root-cause analysis across the application, database, and network layers, and write the post-incident docs that stop recurrence.
  • Tune and operate MySQL at production scale: query analysis, replication topology, backup and recovery, and schema changes against live workloads.
  • Containerize and template services using Docker and infrastructure-as-code patterns to make deployments repeatable, declarative, and boring.
  • Improve observability across the fleet — metrics, logs, traces, and dashboards — so problems are seen before customers feel them.
  • Use modern AI-augmented engineering tools (Claude Code, MCP-based workflows, agentic automation) as a daily multiplier — to operate faster and extend what one engineer can deliver.
  • Document and mentor. Runbooks, design docs, and onboarding material aren't an afterthought here — they're how the team scales.

Requirements

What you’ll need
  • 5+ years operating production Linux/UNIX (RHEL, CentOS/Rocky, Debian/Ubuntu) at meaningful scale.
  • Strong MySQL operational experience — replication, performance tuning, backups, recovery, and schema migrations.
  • Hands-on VMware/vSphere experience in production environments.
  • Java application-tier troubleshooting experience — comfortable reading thread dumps, GC logs, and heap behavior.
  • Solid DevOps fundamentals: Git, CI/CD pipelines, Ansible (or similar configuration management), Terraform (or similar IaC), and Docker.
  • Networking literacy: TCP/IP, DNS, TLS, HTTP/S, load balancing, basic firewalling. You can read a tcpdump and a cert chain.
  • Comfortable scripting in Bash. Python is not required, but you should have a working understanding of programming fundamentals and be able to read, modify, and write straightforward code.
  • Strong troubleshooting instincts and the temperament lead under pressure.
  • Real day-to-day experience using AI-augmented engineering tools (Claude, Cursor, Copilot, MCP servers, agentic workflows) — not just demos.
  • Experience with Datadog or comparable observability platforms.

Benefits

Comp & perks
  • Full Medical, Dental and Vision
  • Annual bonus eligible
  • Free gym in the building
  • Generous PTO policy
  • Summer Fridays....all year round
  • Tuition Reimbursement
  • Discount from building café
  • 401(K) with a 50% company match (up to 6% of employee contribution)
  • Employee Referral Program
  • (1) Volunteer day each year

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LinuxUNIXMySQLVMwareDockerGitCI/CDAnsibleTerraformBash
Soft Skills
troubleshootingmentoringdocumentationincident managementroot-cause analysispressure managementobservability improvement