Principal Data Scientist

Walmart

full-time

Posted on: 12/19/2025

Location Type: Office

Location: Bangalore • 🇮🇳 India

Visit company website

✨ AI Apply

Apply

Job Level

Lead

Tech Stack

Distributed SystemsKubernetesNumpyPandasPrometheusPythonScikit-LearnSQL

About the role

Build, train, and deploy time-series models for smart and predictive autoscaling of Kubernetes workloads.
Traffic and resource demand forecasting.
Seasonality detection (daily/weekly/annual patterns).
Anomaly detection in metrics, logs, and traces.
Perform deep exploratory data analysis (EDA) on large-scale telemetry data (CPU, memory, latency, errors, throughput).
Select, implement, and tune statistical and ML techniques (ARIMA, Prophet, tree-based models, deep learning as appropriate).
Continuously evaluate models using offline metrics and live production feedback.
Write production-grade Python code for model training, inference, and evaluation.
Integrate ML outputs directly into SRE workflows, including: Kubernetes HPA/VPA and custom autoscaling controllers, alerting and incident detection pipelines, capacity planning and cost optimization tools.
Define safeguards, fallback logic, and confidence thresholds to ensure safe autonomous actions.
Debug model and data issues using real production incidents and postmortems.
Build and maintain feature pipelines from observability data sources (Prometheus, OpenTelemetry, logs, traces).
Work with streaming and batch data pipelines to process high-cardinality, high-volume time-series data.
Ensure data quality, freshness, and correctness for real-time decision systems.
Design schemas and feature stores optimized for time-series ML workloads.
Own models end to end: development → deployment → monitoring → retraining.
Implement monitoring for model accuracy and drift, data drift and pipeline failures, impact on system reliability and scaling behavior.
Automate retraining and validation pipelines where appropriate.
Act as the go-to expert for applied ML in SRE contexts.
Review and improve ML and data science code written by other team members.
Partner closely with SREs to translate reliability problems into concrete modeling tasks.
Drive adoption of ML solutions by proving value through metrics and outcomes.

Requirements

12+ years of experience in data science or applied machine learning.
5+ years deploying ML models in production, not just experimentation.
Strong experience working with time-series data at scale.
Proven track record of owning systems end to end in high-availability environments.
Expert-level Python (NumPy, Pandas, SciPy, Scikit-learn).
Strong experience with time-series forecasting and anomaly detection techniques.
Practical understanding of Kubernetes autoscaling (HPA/VPA, custom metrics).
Experience working with metrics, logs, and traces from distributed systems.
Comfortable querying and analyzing large datasets using SQL and time-series databases.
Strong understanding of distributed systems behavior (latency, load, failures, cascading effects).

Benefits

Beyond our great compensation package, you can receive incentive awards for your performance.
Other great perks include a host of best-in-class benefits maternity and parental leave, PTO, health benefits, and much more.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

time-series modelstraffic forecastingresource demand forecastinganomaly detectionexploratory data analysisstatistical techniquesmachine learning techniquesPythonKubernetesSQL

Soft skills

problem-solvingcollaborationcommunicationleadershipcritical thinkingadaptabilitymentoringownershipattention to detailproactive