Software Engineer – Platform Engineering, Observability

Mercari, Inc.

Senior Platform Engineer for Mercari, building observability systems at scale. Lead improvements in incident detection and build self-service tools for engineers.

Posted 6/8/2026full-timeBengaluru • 🇮🇳 IndiaMid-LevelSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

observabilitymonitoringKubernetesGoPythonGCPAWSTerraformOpenTelemetryalerting systems

Soft Skills

documentationcommunicationmentoringtechnical decision-makingcollaborationleadershipengineering culture shapingproblem-solvingefficiency improvementcommitment to company values

Tools & Technologies

DatadogPrometheusGrafanaAI technologiesself-service toolingincident response toolsdata pipelinesalert correlation toolsanomaly detection toolsdistributed tracing

Industry Keywords

Mean Time to Detect (MTTD)Mean Time to Mitigate (MTTM)SLIsSLOserror budgetsmicroservicesscalable production systemsoperational workflowssystem visibilityreliability frameworks

Tech Stack

Tools & technologies

AWSCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform

About the role

Key responsibilities & impact

Design, build, and operate Mercari's observability platform - covering metrics, logs, traces, and alerting at scale.
Drive measurable improvements in Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM) across all services.
Build AI-powered solutions for automated anomaly detection, alert correlation, and incident response assistance.
Develop self-service observability tooling that enables product engineers to instrument, monitor, and alert on their services independently.
Define and champion observability standards, best practices, and SLO frameworks across the engineering organization.
Collaborate with other platform teams, SRE, Security, and product engineering teams to ensure comprehensive system visibility and reliability.
Automate operational workflows to reduce toil and improve the team's efficiency.
Lead technical decisions, mentor team members, and actively shape the engineering culture within the observability team.

Requirements

What you’ll need

5+ years of experience building, operating, and maintaining scalable production systems.
Strong expertise in observability and monitoring platforms (Datadog, Prometheus, Grafana, or similar) in production environments.
Hands-on experience with Kubernetes and container orchestration in production.
Proficiency in Go or Python for building infrastructure tooling and services.
Experience with cloud platforms (GCP and/or AWS) and Infrastructure as Code (Terraform).
Deep understanding of metrics, logging, and distributed tracing, including instrumentation patterns and data pipeline design.
Experience designing and tuning alerting systems to reduce noise and improve incident detection.
Strong understanding of SLIs, SLOs, and error budgets as reliability frameworks.
Proven ability to develop internal tools and platforms that improve developer productivity.
Strong documentation and communication skills; able to write design docs and drive technical discussions.
Shared commitment to our company's mission and values.
Experience leveraging AI technologies for observability use cases (anomaly detection, alert correlation, root cause analysis) (Preferred).
Track record of measurably improving MTTD and MTTM in a microservices environment (Preferred).
Experience with observability for large-scale distributed systems (500+ microservices) (Preferred).
Hands-on experience with OpenTelemetry for instrumentation and data collection (Preferred).
Cost optimization of observability data at scale (sampling strategies, data tiering, pipeline efficiency) (Preferred).

Benefits

Comp & perks

Employment Status : Full-time
Office location: Bangalore
Hybrid workstyle
Full flextime (no core time)