DDN

Senior Staff Engineer – AI In-Market Engineering

DDN

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

CloudDistributed SystemsGoGrafanaKubernetesLinuxNFSPrometheusPythonTCP/IP

About the role

  • Be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments.
  • Technical leader of the In-Market Engineering team, driving technical decisions for the team.
  • Build and implement a tools platform strategy for Infinia.
  • Build a reporting platform for Infinia using dial-home data.
  • Define and implement training for the support organisation.
  • Deliver log analytics strategy and design framework for Infinia.
  • Own critical customer case escalations end-to-end, including deep root cause analysis and mitigation strategies.
  • Utilize AI-powered debugging, log analysis, and system pattern recognition tools to accelerate resolution.
  • Be the subject-matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc.
  • Reproduce complex customer issues and propose product improvements or workarounds.
  • Author and maintain detailed runbooks, performance tuning guides, and RCA documentation.
  • Feed real-world support insights back into the development cycle to improve reliability and diagnostics.
  • Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success.
  • Translate technical issues into executive-ready summaries and business impact statements.
  • Participate in post-mortems and executive briefings for strategic accounts.
  • Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools.

Requirements

  • 12+ years in enterprise storage, distributed systems, or cloud infrastructure
  • Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals.
  • Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf).
  • Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures.
  • TCP/IP / Network top expert.
  • Exposure to RDMA, NVMe-oF, or high-performance networking stacks.
  • Exceptional communication and executive reporting skills.
  • Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR.
  • Experience with DDN, VAST, Weka, or similar scale-out file systems.
  • Expert scripting/coding ability in Python, Bash, or Go.
  • Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry.
  • Knowledge of replication, consistency models, and data integrity mechanisms.
  • Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures.
  • Participation in an on-call rotation to provide after-hours support as needed.