Transaction Network Services (TNS)

Observability Engineer

Transaction Network Services (TNS)

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSAzureCloudGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonSplunk

About the role

  • Lead the design, implementation, and continuous improvement of the observability stack, including monitoring, logging, and tracing systems.
  • Define and enforce observability standards and best practices across engineering teams to ensure consistent instrumentation and visibility.
  • Build scalable monitoring solutions that provide real-time insights into system health, performance, and availability.
  • Develop and maintain dashboards, alerts, and automated responses to proactively detect and resolve issues before they impact users.
  • Collaborate with development, infrastructure, and SRE teams to integrate observability into CI/CD pipelines and production workflows.
  • Conduct root cause analysis and post-incident reviews to identify observability gaps and drive improvements.
  • Evaluate and implement tools such as Splunk, Splunk Observability Cloud, Netreo to support monitoring and alerting needs.
  • Champion a culture of data-driven decision-making by enabling teams to access and interpret observability data effectively.
  • Automating observability pipelines and alerting mechanisms.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Observability roles.
  • 3+ years of experience in SRE/DevOps.
  • Demonstrated success in deploying and managing monitoring tools and observability solutions at scale.
  • Hands-on experience with monitoring and observability platforms such as Splunk, Splunk Observability Cloud (O11y), Grafana, Prometheus, Datadog.
  • Proven ability to design and implement SLOs/SLIs, dashboards, and alerting strategies that align with business and operational goals.
  • Familiarity with incident response, alert tuning, and postmortem analysis.
  • Strong scripting or programming skills (e.g., Python, Go, Bash).
  • Excellent communication and collaboration skills, with a focus on knowledge sharing and mentorship.
  • Strong understanding of distributed tracing tools like OpenTelemetry, Jaeger, or Zipkin.
  • Experience integrating observability into CI/CD pipelines and Kubernetes environments.
  • Contributions to open-source observability tools or frameworks.
  • Strong knowledge of cloud platforms (AWS, Azure, or GCP).