
Senior Staff Software Engineer – Agentic Automation
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: Canada
Visit company websiteExplore more
Salary
💰 CA$200,000 - CA$250,000 per year
Job Level
About the role
- Design and implement agentic AI workflows using LLM-based agents, tool calling, RAG patterns, and orchestration frameworks.
- Push the boundaries of what AI-assisted operations can achieve.
- Build robust integrations and automation pipelines across ServiceNow, identity management, monitoring platforms, and enterprise SaaS.
- Own the full stack from infrastructure to user facing tools.
- Triage and resolve Enterprise issues with a focus on automation and improving mitigation and resolution times.
- Manage and troubleshoot Enterprise scale collaboration, productivity, AI and Infrastructure systems.
- Trace and root cause complex, multi system failures. identify patterns in recurring tickets, and build automation or self-service solutions.
- Build and maintain runbooks, troubleshooting guides, and knowledge base articles that elevate team capabilities.
- Mentor team members on troubleshooting methodology and systems thinking.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, IT, or related field (or equivalent experience)
- 12+ overall years experience in SRE, Enterprise Support or Devops
- Experience with SaaS, hybrid cloud, AI/ML environments
- Experience building production grade agentic workflows (e.g., multi-agent systems and MCP servers)
- Software engineering fundamentals with deep experience in building products and operating large scale systems.
- Expertise in two or more backend languages such as Go, Python, or Java with a track record of owning complex production systems.
- Full stack engineering experience, including building user-facing web applications and operational dashboards using modern frontend frameworks such as React.js, along with backend APIs and data pipelines.
- Systems thinker who naturally traces dependencies, considers second-order effects, and asks "why did this break?" not just "how do I fix it?"
- Strong incident management skills: triage, root-cause analysis, blameless postmortems, pattern recognition
- Expert troubleshooting across Enterprise hybrid stack such as Jira, Microsoft, OS [Apple, Linux, and Windows], Infrastructure systems such as compute, AI, and storage.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLM-based agentstool callingRAG patternsorchestration frameworksproduction grade agentic workflowsbackend languagesGoPythonJavafull stack engineering
Soft Skills
systems thinkingincident managementtriageroot-cause analysisblameless postmortemspattern recognitionmentoringtroubleshooting methodology
Certifications
Bachelor’s degreeMaster’s degreeComputer ScienceEngineeringIT