Walmart

Distinguished Software Engineer – AI/ML Engineer

Walmart

full-time

Posted on:

Location Type: Office

Location: SunnyvaleCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $169,000 - $338,000 per year

Job Level

About the role

  • As a Distinguished AI/ML Engineer within Walmart Global Tech’s Reliability Engineering Organization, you will lead the technical development of next-generation agentic AI systems and intelligent automation solutions that ensure mission-critical reliability, scalability, and operational excellence across Walmart’s entire technology ecosystem.
  • Architect and implement cutting-edge machine learning platforms and autonomous agents that transform how we manage change and performance, monitor, predict, and automatically resolve issues.
  • Design and implement multi-agent orchestration platforms that coordinate autonomous agents for change management, capacity planning, and performance optimization across e-commerce, supply chain, and in-store systems.
  • Develop self-healing infrastructure platforms that leverage AI to predict, prevent, and automatically remediate system issues.
  • Collaborate with engineering teams and leadership to reduce mean time to detect (MTTD) and mean time to restore (MTTR) through intelligent automation and predictive capabilities.

Requirements

  • Bachelor’s or Master’s degree in engineering, Computer Science, or a related field with 12+ years of hands-on experience in Reliability Engineering, AI/ML Engineering, or Platform Engineering.
  • Proven record as a senior individual contributor influencing architecture and driving technical excellence across large organizations.
  • Deep experience operating mission-critical systems, with expertise in MTTD, MTTR, availability, change management, model performance, and autonomous system reliability.
  • Expert-level AI/ML engineering experience, including deep learning frameworks such as TensorFlow and PyTorch and large-scale production ML deployments.
  • Advanced experience with agentic AI systems, including multi-agent frameworks, autonomous decision-making systems, LLM-based agents, and agent orchestration platforms.
  • Comprehensive Reliability Engineering expertise, including service management (Incident, Problem, and Change Management) and performance and capacity engineering for AI/ML systems.
  • Expert-level cloud engineering experience (Azure, GCP, AWS) with containerization (Kubernetes, Docker), serverless architectures, and cloud-native AI services.
  • Deep observability experience across distributed tracing, metrics, logs, APM, and AI-driven anomaly detection.
  • Strong platform engineering background including infrastructure as code, service mesh architectures, API gateways, and self-service developer platforms.
Benefits
  • Health benefits include medical, vision and dental coverage.
  • Financial benefits include 401(k), stock purchase and company-paid life insurance.
  • Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
  • Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
  • You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes.
  • Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI engineeringML engineeringdeep learningmulti-agent orchestrationautonomous systemsReliability Engineeringcloud engineeringinfrastructure as codepredictive capabilitiesperformance optimization
Soft Skills
leadershipcollaborationinfluencing architecturetechnical excellenceproblem-solvingchange managementcommunicationorganizational skillsstrategic thinkingteamwork