Observability Engineer

Transaction Network Services (TNS)

full-time

Posted on: 9/6/2025

Location: 🇮🇳 India

✨ AI Apply

Mid-LevelSenior

AWSAzureCloudGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonSplunk

About the role

Lead the design, implementation, and continuous improvement of the observability stack, including monitoring, logging, and tracing systems.
Define and enforce observability standards and best practices across engineering teams to ensure consistent instrumentation and visibility.
Build scalable monitoring solutions that provide real-time insights into system health, performance, and availability.
Develop and maintain dashboards, alerts, and automated responses to proactively detect and resolve issues before they impact users.
Collaborate with development, infrastructure, and SRE teams to integrate observability into CI/CD pipelines and production workflows.
Conduct root cause analysis and post-incident reviews to identify observability gaps and drive improvements.
Evaluate and implement tools such as Splunk, Splunk Observability Cloud, Netreo to support monitoring and alerting needs.
Champion a culture of data-driven decision-making by enabling teams to access and interpret observability data effectively.
Automating observability pipelines and alerting mechanisms.

5+ years of experience in Site Reliability Engineering, DevOps, or Observability roles.
3+ years of experience in SRE/DevOps.
Demonstrated success in deploying and managing monitoring tools and observability solutions at scale.
Hands-on experience with monitoring and observability platforms such as Splunk, Splunk Observability Cloud (O11y), Grafana, Prometheus, Datadog.
Proven ability to design and implement SLOs/SLIs, dashboards, and alerting strategies that align with business and operational goals.
Familiarity with incident response, alert tuning, and postmortem analysis.
Strong scripting or programming skills (e.g., Python, Go, Bash).
Excellent communication and collaboration skills, with a focus on knowledge sharing and mentorship.
Strong understanding of distributed tracing tools like OpenTelemetry, Jaeger, or Zipkin.
Experience integrating observability into CI/CD pipelines and Kubernetes environments.
Contributions to open-source observability tools or frameworks.
Strong knowledge of cloud platforms (AWS, Azure, or GCP).