Cognite

Senior Observability Engineer – SRE

Cognite

full-time

Posted on:

Location Type: Hybrid

Location: Bengaluru • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSAzureCloudGoGoogle Cloud PlatformGrafanaJavaJenkinsKotlinKubernetesPrometheusPythonSplunkTerraform

About the role

  • Conduct assessments of existing observability architectures to identify gaps and improvement opportunities.
  • Design and implement scalable log aggregation pipelines for centralized and efficient data collection.
  • Apply noise-reduction techniques to filter irrelevant or false-positive alerts, enhancing focus on actionable issues.
  • Develop and maintain monitoring dashboards that deliver actionable insights across applications and infrastructure.
  • Lead the migration from Lightstep to Honeycomb, ensuring seamless data pipeline transitions, OpenTelemetry alignment, and stakeholder adoption.
  • Collaborate with infrastructure and product teams to integrate observability tooling into CI/CD workflows and cloud environments.
  • Analyze telemetry data (metrics, logs, traces) to troubleshoot complex system behaviors and recommend improvements.
  • Participate in production debugging and incident troubleshooting using telemetry data.
  • Mentor junior engineers on log management, event correlation, distributed tracing, and alert management.
  • Stay current on observability innovations and recommend adoption strategies aligned with organizational goals.
  • Support post-incident reviews and continuous improvement through data-driven root cause analysis.
  • Drive continuous improvement in reliability and operational excellence through proactive observability initiatives.

Requirements

  • 6+ years of experience in software or systems engineering, with at least 3 years focused on observability or SRE practices.
  • Hands-on experience with observability tools such as Honeycomb, VictoriaMetrics, Lightstep, Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, or New Relic.
  • Strong knowledge of OpenTelemetry instrumentation (metrics, traces, logs) and SLIs/SLOs for reliability tracking.
  • Experience with distributed tracing, event correlation, and noise reduction frameworks.
  • Proficiency in one or more programming/scripting languages such as Python, Java, Kotlin, Go, or Shell.
  • Working knowledge of Infrastructure as Code (Terraform) and CI/CD (Jenkins, Github Actions,...) pipelines.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
  • Strong analytical, troubleshooting, and communication skills with the ability to work effectively across teams.
  • Experience conducting observability gap assessments and defining improvement plans.
  • Experience working in complex or multi-cloud environments is preferred.
Benefits
  • Join the Global Cognite Community
  • Diverse, global team of 70+ nationalities
  • Modern Bengaluru hub in a hybrid, high-trust environment with a flat structure and direct access to decision-makers.
  • Professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
observabilitySRE practicesOpenTelemetry instrumentationdistributed tracingevent correlationnoise reduction frameworksInfrastructure as CodeCI/CD pipelinesprogramming languagesdata analysis
Soft skills
analytical skillstroubleshooting skillscommunication skillsmentoringcollaborationcontinuous improvementleadershipstakeholder engagementproblem-solvingadaptability