Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Mercari, Inc.

Software Engineer – Platform Engineering, Observability

Mercari, Inc.

Senior Platform Engineer for Mercari, building observability systems at scale. Lead improvements in incident detection and build self-service tools for engineers.

Posted 6/8/2026full-timeBengaluru • 🇮🇳 IndiaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AWSCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • Design, build, and operate Mercari's observability platform - covering metrics, logs, traces, and alerting at scale.
  • Drive measurable improvements in Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM) across all services.
  • Build AI-powered solutions for automated anomaly detection, alert correlation, and incident response assistance.
  • Develop self-service observability tooling that enables product engineers to instrument, monitor, and alert on their services independently.
  • Define and champion observability standards, best practices, and SLO frameworks across the engineering organization.
  • Collaborate with other platform teams, SRE, Security, and product engineering teams to ensure comprehensive system visibility and reliability.
  • Automate operational workflows to reduce toil and improve the team's efficiency.
  • Lead technical decisions, mentor team members, and actively shape the engineering culture within the observability team.

Requirements

What you’ll need
  • 5+ years of experience building, operating, and maintaining scalable production systems.
  • Strong expertise in observability and monitoring platforms (Datadog, Prometheus, Grafana, or similar) in production environments.
  • Hands-on experience with Kubernetes and container orchestration in production.
  • Proficiency in Go or Python for building infrastructure tooling and services.
  • Experience with cloud platforms (GCP and/or AWS) and Infrastructure as Code (Terraform).
  • Deep understanding of metrics, logging, and distributed tracing, including instrumentation patterns and data pipeline design.
  • Experience designing and tuning alerting systems to reduce noise and improve incident detection.
  • Strong understanding of SLIs, SLOs, and error budgets as reliability frameworks.
  • Proven ability to develop internal tools and platforms that improve developer productivity.
  • Strong documentation and communication skills; able to write design docs and drive technical discussions.
  • Shared commitment to our company's mission and values.
  • Experience leveraging AI technologies for observability use cases (anomaly detection, alert correlation, root cause analysis) (Preferred).
  • Track record of measurably improving MTTD and MTTM in a microservices environment (Preferred).
  • Experience with observability for large-scale distributed systems (500+ microservices) (Preferred).
  • Hands-on experience with OpenTelemetry for instrumentation and data collection (Preferred).
  • Cost optimization of observability data at scale (sampling strategies, data tiering, pipeline efficiency) (Preferred).

Benefits

Comp & perks
  • Employment Status : Full-time
  • Office location: Bangalore
  • Hybrid workstyle
  • Full flextime (no core time)

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
observabilitymonitoringKubernetesGoPythonGCPAWSTerraformOpenTelemetryalerting systems
Soft Skills
documentationcommunicationmentoringtechnical decision-makingcollaborationleadershipengineering culture shapingproblem-solvingefficiency improvementcommitment to company values