FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Software Engineer – Platform Engineering, Observability
Mercari, Inc.Senior Platform Engineer for Mercari, building observability systems at scale. Lead improvements in incident detection and build self-service tools for engineers.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonTerraform
About the role
Key responsibilities & impact- Design, build, and operate Mercari's observability platform - covering metrics, logs, traces, and alerting at scale.
- Drive measurable improvements in Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM) across all services.
- Build AI-powered solutions for automated anomaly detection, alert correlation, and incident response assistance.
- Develop self-service observability tooling that enables product engineers to instrument, monitor, and alert on their services independently.
- Define and champion observability standards, best practices, and SLO frameworks across the engineering organization.
- Collaborate with other platform teams, SRE, Security, and product engineering teams to ensure comprehensive system visibility and reliability.
- Automate operational workflows to reduce toil and improve the team's efficiency.
- Lead technical decisions, mentor team members, and actively shape the engineering culture within the observability team.
Requirements
What you’ll need- 5+ years of experience building, operating, and maintaining scalable production systems.
- Strong expertise in observability and monitoring platforms (Datadog, Prometheus, Grafana, or similar) in production environments.
- Hands-on experience with Kubernetes and container orchestration in production.
- Proficiency in Go or Python for building infrastructure tooling and services.
- Experience with cloud platforms (GCP and/or AWS) and Infrastructure as Code (Terraform).
- Deep understanding of metrics, logging, and distributed tracing, including instrumentation patterns and data pipeline design.
- Experience designing and tuning alerting systems to reduce noise and improve incident detection.
- Strong understanding of SLIs, SLOs, and error budgets as reliability frameworks.
- Proven ability to develop internal tools and platforms that improve developer productivity.
- Strong documentation and communication skills; able to write design docs and drive technical discussions.
- Shared commitment to our company's mission and values.
- Experience leveraging AI technologies for observability use cases (anomaly detection, alert correlation, root cause analysis) (Preferred).
- Track record of measurably improving MTTD and MTTM in a microservices environment (Preferred).
- Experience with observability for large-scale distributed systems (500+ microservices) (Preferred).
- Hands-on experience with OpenTelemetry for instrumentation and data collection (Preferred).
- Cost optimization of observability data at scale (sampling strategies, data tiering, pipeline efficiency) (Preferred).
Benefits
Comp & perks- Employment Status : Full-time
- Office location: Bangalore
- Hybrid workstyle
- Full flextime (no core time)
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
observabilitymonitoringKubernetesGoPythonGCPAWSTerraformOpenTelemetryalerting systems
Soft Skills
documentationcommunicationmentoringtechnical decision-makingcollaborationleadershipengineering culture shapingproblem-solvingefficiency improvementcommitment to company values