d-Matrix

Staff System Software Engineer, AI/ML

d-Matrix

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

CloudDistributed SystemsKubernetesLinuxPythonPyTorchRayTensorflow

About the role

  • Part of the team that helps productize the SW stack for the AI compute engine
  • Develop, enhance, and maintain next-generation AI deployment software
  • Work across all aspects of the full-stack toolchain and optimize hardware-software co-design
  • Build and scale software deliverables under tight development windows
  • Build out deployment infrastructure and collaborate closely with ML, compiler, and hardware experts
  • Contribute system software expertise to design distributed, high-performance deployment systems

Requirements

  • BS in Computer Science, Engineering, Math, Physics or related degree with 5+ years of industry software development experience
  • MS in Computer Science, Engineering, Math, Physics or related degree preferred with 4+ years
  • Strong grasp of system software, data structures, computer architecture, and machine learning fundamentals
  • Proficient in C/C++/Python development in Linux environment and using standard development tools
  • Experience with distributed, high-performance software design and implementation
  • Self-motivated team player with a strong sense of ownership and leadership
  • Preferred: MS or PhD in Computer Science, Electrical Engineering, or related fields
  • Preferred: Experience with inference servers/model serving frameworks (TensorRT-LLM, vLLM, SGLang, etc.)
  • Preferred: Experience with deep learning frameworks (PyTorch, TensorFlow)
  • Preferred: Experience with deep learning runtimes (ONNX Runtime, TensorRT)
  • Preferred: Experience with distributed systems collectives such as NCCL and OpenMPI
  • Preferred: Experience with software testing fundamentals
  • Preferred: Experience deploying ML workloads (LLMs, VLMs, NLP, etc.) on distributed systems
  • Preferred: Experience with Kubernetes, Ray or other MLOps tools and techniques
  • Preferred: Prior startup, small team or incubation experience
  • Preferred: Work experience at a cloud provider or AI compute/subsystem company