NVIDIA

Principal Platform Software Engineer – RAS

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $272,000 - $431,250 per year

Job Level

About the role

  • Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from Nvidia
  • Work with customers, product management and other architects to narrow down on requirements for implementation
  • Bring up clarity on architecture for fleet health monitoring and fault-remediation solution at scale
  • Work with customers and other architects, understand their requirements on health monitoring
  • Detailed architecture, do POCs to validate architecture
  • Educate customers about product architecture and take feedback
  • Write architecture specs, design documents and own end to end delivery of product
  • Do code review for the code produced because of architecture specs
  • Ensure product is properly tested by working with the development team
  • Drive product life cycles with QA teams to productize the code and be responsible as a product owner
  • Articulate requirements as part of Jira and bug management tools and work out an end-to-end execution plan
  • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.

Requirements

  • BS, MS, or PhD in EE/CS or related field of education (or equivalent experience)
  • 15+ years hands-on coding experience
  • Strong knowledge of time series databases like Influxdb & Prometheus
  • Strong knowledge of building and consuming REST APIs (Redfish is big plus)
  • Strong knowledge of telemetry visualization solutions like Grafana & Influx
  • Strong knowledge of firmware architecture, optimize firmware for low latency APIs
  • Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements
  • Proven record of solutions for scalability
  • Strong and demonstrable skill in C/C++ and Python
  • Experience programming and debugging skills for server platforms
  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira.
Benefits
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
CC++Pythontime series databasesREST APIstelemetry visualizationfirmware architecturealgorithm analysisscalability solutionsserver platform debugging
Soft Skills
communicationcollaborationcustomer educationfeedback incorporationarchitecture specificationend-to-end deliverycode reviewproduct ownershipexecution planningproblem-solving
Certifications
BS in EEMS in EEPhD in EEBS in CSMS in CSPhD in CS