Walmart

Principal Data Scientist

Walmart

full-time

Posted on:

Location Type: Office

Location: Bangalore • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

Distributed SystemsKubernetesNumpyPandasPrometheusPythonScikit-LearnSQL

About the role

  • Build, train, and deploy time-series models for smart and predictive autoscaling of Kubernetes workloads.
  • Traffic and resource demand forecasting.
  • Seasonality detection (daily/weekly/annual patterns).
  • Anomaly detection in metrics, logs, and traces.
  • Perform deep exploratory data analysis (EDA) on large-scale telemetry data (CPU, memory, latency, errors, throughput).
  • Select, implement, and tune statistical and ML techniques (ARIMA, Prophet, tree-based models, deep learning as appropriate).
  • Continuously evaluate models using offline metrics and live production feedback.
  • Write production-grade Python code for model training, inference, and evaluation.
  • Integrate ML outputs directly into SRE workflows, including: Kubernetes HPA/VPA and custom autoscaling controllers, alerting and incident detection pipelines, capacity planning and cost optimization tools.
  • Define safeguards, fallback logic, and confidence thresholds to ensure safe autonomous actions.
  • Debug model and data issues using real production incidents and postmortems.
  • Build and maintain feature pipelines from observability data sources (Prometheus, OpenTelemetry, logs, traces).
  • Work with streaming and batch data pipelines to process high-cardinality, high-volume time-series data.
  • Ensure data quality, freshness, and correctness for real-time decision systems.
  • Design schemas and feature stores optimized for time-series ML workloads.
  • Own models end to end: development → deployment → monitoring → retraining.
  • Implement monitoring for model accuracy and drift, data drift and pipeline failures, impact on system reliability and scaling behavior.
  • Automate retraining and validation pipelines where appropriate.
  • Act as the go-to expert for applied ML in SRE contexts.
  • Review and improve ML and data science code written by other team members.
  • Partner closely with SREs to translate reliability problems into concrete modeling tasks.
  • Drive adoption of ML solutions by proving value through metrics and outcomes.

Requirements

  • 12+ years of experience in data science or applied machine learning.
  • 5+ years deploying ML models in production, not just experimentation.
  • Strong experience working with time-series data at scale.
  • Proven track record of owning systems end to end in high-availability environments.
  • Expert-level Python (NumPy, Pandas, SciPy, Scikit-learn).
  • Strong experience with time-series forecasting and anomaly detection techniques.
  • Practical understanding of Kubernetes autoscaling (HPA/VPA, custom metrics).
  • Experience working with metrics, logs, and traces from distributed systems.
  • Comfortable querying and analyzing large datasets using SQL and time-series databases.
  • Strong understanding of distributed systems behavior (latency, load, failures, cascading effects).
Benefits
  • Beyond our great compensation package, you can receive incentive awards for your performance.
  • Other great perks include a host of best-in-class benefits maternity and parental leave, PTO, health benefits, and much more.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
time-series modelstraffic forecastingresource demand forecastinganomaly detectionexploratory data analysisstatistical techniquesmachine learning techniquesPythonKubernetesSQL
Soft skills
problem-solvingcollaborationcommunicationleadershipcritical thinkingadaptabilitymentoringownershipattention to detailproactive