Salary
💰 $136,000 - $177,000 per year
About the role
- Triage, diagnose, and troubleshoot problems with DDN storage systems in customer production environments.
- Take ownership of customer cases, validate technical aspects, and drive them to resolution with root cause analysis.
- Serve as primary escalation point for high-severity support cases, coordinating with customers and internal teams to minimize customer impact.
- Develop and contribute to AI-powered debugging, log analysis, and system pattern recognition tools to accelerate resolution.
- Collaborate with engineering, support, and services teams to coordinate feature development and software patches addressing field issues.
- Review product documentation and provide feedback for internal and external customer needs; propose product quality and usability improvements.
- Conduct product and feature training sessions for Technical Support & Professional Services teams.
- Participate in PIER (Post Incident & Escalations Review) discussions and executive briefings; produce executive-ready RCAs.
- Provide on-call assistance outside regular hours during critical customer issues.
Requirements
- 10+ years of hands-on experience in software development & enterprise grade product support.
- Advanced debugging skills at kernel/system/protocol/app (e.g., GDB, strace, tcpdump & perf).
- Exceptional analytical and methodical problem-solving skills particularly in diagnosing complex hardware-software interaction issues on a typical high end embedded platform.
- Strong communication skills—written, verbal, and reporting—with the ability to convey technical concepts clearly to diverse audiences.
- Deep understanding of high-speed parallel data transmission technologies including but not limited to SAS, SCSI, NVME Fabric, IP over InfiniBand, Fibre Channel & InfiniBand topologies.
- Strong knowledge of storage products, cloud storage architectures, cloud computing environments, data center operations.
- Proven experience interfacing with customers to troubleshoot product problems during install and production (desired).
- Strong understanding of data storage concepts including RAID/Erasure Coding & block storage (desired).
- Hands-on experience with large-scale file systems like GPFS, Lustre or other parallel file systems (desired).
- Familiarity with AI-driven diagnostic tools including log pattern analysis, LLM-based summarization, and automated RCA solutions (desired).
- Participation in an on-call rotation to provide after-hours support as needed.
- Ability to be onsite in Columbia, MD at least twice a week (hybrid).