Software Engineer II – Site Reliability Operations Engineer

Walmart

full-time

Posted on: 10/21/2025

Location Type: Hybrid

Location: Sunnyvale • California • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $104,000 - $202,000 per year

Job Level

JuniorMid-Level

Tech Stack

AzureCloudDistributed SystemsDNSFirewallsGoGoogle Cloud PlatformGrafanaGraphiteJavaJavaScriptLinuxNode.jsOpenStackPrometheusPythonReactServiceNowSplunkTCP/IPUnix

About the role

Acquire in-depth technical knowledge of omnichannel cloud platforms, web traffic flows, micro-services, and service dependencies for major incident resolution.
Provide support for Unix and Linux systems from Kernel to Shell and beyond, taking into consideration system libraries, file systems, and client-server protocols.
Leverage knowledge of network technologies such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, CDN, OSI layers, Firewalls, Gateway, Proxy, and Load balancers.
Provide L1 and L2 production support for multiple cloud technologies such as Open stack, Cloud Native platform, Microsoft Azure, and Google Cloud Platform for triaging critical issues using various internal and vendor-related tools.
Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Kibana, Graphite, Service Now, JIRA, Dynatrace, New Relic, Omniture, Splunk, and CDN logs.
Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery, and focusing on immediate restoration of large-scale enterprise systems.
Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Kibana, Splunk, Graphite, New Relic to improve visibility, pro-actively detect issues and restore system availability.
Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight and xMatters.
Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages.
Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments.

Requirements

2+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
Bachelor's Degree in Computer Science or a related field, or relevant work experience.
Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
Experience and exposure working in a 24/7 operations support environment.
Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
Experience investigating, analyzing and troubleshooting large scale enterprise systems.
Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
Experience administering Unix/Linux in a production environment.
Experience working with and developing enterprise monitoring/tooling/logging solutions like Grafana, Kibana, Splunk, Openobserve, Graphite, Nagios, New Relic, DynaTrace and Prometheus.
Working knowledge of one or more cloud technologies such as AZURE, GCP, OpenStack.
Experience with distributed version control like Git or similar
Designing and implementing JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters
Programming experience in one or more of the following languages: Go, Java, Python, Shell, etc.
Experience in data science/machine learning would be advantageous.

Benefits

Health benefits including medical, vision and dental coverage
401(k)
Stock purchase and company-paid life insurance
PTO (including sick leave, parental leave, family care leave, bereavement, jury duty, and voting)
Short-term and long-term disability
Company discounts
Military Leave Pay
Adoption and surrogacy expense reimbursement
Live Better U education benefit program for associates

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

UnixLinuxJavaJavaScriptPythonShellTCP/IPUDPICMPCloud technologies

Soft skills

incident managementproblem-solvingownershipinitiativedrive

Certifications

Bachelor's Degree in Computer Science