NVIDIA

Senior Staff Software Engineer – Agentic Automation

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: Canada

Visit company website

Explore more

AI Apply
Apply

Salary

💰 CA$200,000 - CA$250,000 per year

Job Level

About the role

  • Design and implement agentic AI workflows using LLM-based agents, tool calling, RAG patterns, and orchestration frameworks.
  • Push the boundaries of what AI-assisted operations can achieve.
  • Build robust integrations and automation pipelines across ServiceNow, identity management, monitoring platforms, and enterprise SaaS.
  • Own the full stack from infrastructure to user facing tools.
  • Triage and resolve Enterprise issues with a focus on automation and improving mitigation and resolution times.
  • Manage and troubleshoot Enterprise scale collaboration, productivity, AI and Infrastructure systems.
  • Trace and root cause complex, multi system failures. identify patterns in recurring tickets, and build automation or self-service solutions.
  • Build and maintain runbooks, troubleshooting guides, and knowledge base articles that elevate team capabilities.
  • Mentor team members on troubleshooting methodology and systems thinking.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, IT, or related field (or equivalent experience)
  • 12+ overall years experience in SRE, Enterprise Support or Devops
  • Experience with SaaS, hybrid cloud, AI/ML environments
  • Experience building production grade agentic workflows (e.g., multi-agent systems and MCP servers)
  • Software engineering fundamentals with deep experience in building products and operating large scale systems.
  • Expertise in two or more backend languages such as Go, Python, or Java with a track record of owning complex production systems.
  • Full stack engineering experience, including building user-facing web applications and operational dashboards using modern frontend frameworks such as React.js, along with backend APIs and data pipelines.
  • Systems thinker who naturally traces dependencies, considers second-order effects, and asks "why did this break?" not just "how do I fix it?"
  • Strong incident management skills: triage, root-cause analysis, blameless postmortems, pattern recognition
  • Expert troubleshooting across Enterprise hybrid stack such as Jira, Microsoft, OS [Apple, Linux, and Windows], Infrastructure systems such as compute, AI, and storage.
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLM-based agentstool callingRAG patternsorchestration frameworksproduction grade agentic workflowsbackend languagesGoPythonJavafull stack engineering
Soft Skills
systems thinkingincident managementtriageroot-cause analysisblameless postmortemspattern recognitionmentoringtroubleshooting methodology
Certifications
Bachelor’s degreeMaster’s degreeComputer ScienceEngineeringIT