DDN

Senior Staff Engineer – AI In-Market Engineering

DDN

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

CloudDistributed SystemsGoGrafanaKubernetesLinuxNFSPrometheusPythonTCP/IP

About the role

  • Serve as final escalation point for complex and critical issues in enterprise and hyperscale environments
  • Lead technical direction of the In-Market Engineering team and drive technical decisions
  • Build and implement tools platform strategy for Infinia and a reporting platform using dial-home data
  • Define and implement training for the support organization and author runbooks, performance tuning guides, and RCA documentation
  • Deliver log analytics strategy and design framework for Infinia; utilize AI-powered debugging, log analysis, and system pattern recognition tools
  • Own critical customer case escalations end-to-end, including deep root cause analysis and mitigation strategies
  • Reproduce complex customer issues, propose product improvements or workarounds, and feed real-world support insights back into development
  • Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success and translate technical issues into executive-ready summaries
  • Participate in post-mortems and executive briefings for strategic accounts; drive adoption of observability, automation, and self-healing support mechanisms

Requirements

  • 12+ years in enterprise storage, distributed systems, or cloud infrastructure
  • Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals
  • Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf)
  • Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures
  • TCP/IP / Network top expert
  • Exposure to RDMA, NVMe-oF, or high-performance networking stacks
  • Exceptional communication and executive reporting skills
  • Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR
  • Participation in an on-call rotation to provide after-hours support as needed
  • Preferred: Experience with DDN, VAST, Weka, or similar scale-out file systems
  • Preferred: Expert scripting/coding ability in Python, Bash, or Go
  • Preferred: Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry
  • Preferred: Knowledge of replication, consistency models, and data integrity mechanisms
  • Preferred: Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures