
Senior Observability Engineer – SRE
Cognite
full-time
Posted on:
Location Type: Hybrid
Location: Bengaluru • 🇮🇳 India
Visit company websiteJob Level
Senior
Tech Stack
AWSAzureCloudGoGoogle Cloud PlatformGrafanaJavaJenkinsKotlinKubernetesPrometheusPythonSplunkTerraform
About the role
- Conduct assessments of existing observability architectures to identify gaps and improvement opportunities.
- Design and implement scalable log aggregation pipelines for centralized and efficient data collection.
- Apply noise-reduction techniques to filter irrelevant or false-positive alerts, enhancing focus on actionable issues.
- Develop and maintain monitoring dashboards that deliver actionable insights across applications and infrastructure.
- Lead the migration from Lightstep to Honeycomb, ensuring seamless data pipeline transitions, OpenTelemetry alignment, and stakeholder adoption.
- Collaborate with infrastructure and product teams to integrate observability tooling into CI/CD workflows and cloud environments.
- Analyze telemetry data (metrics, logs, traces) to troubleshoot complex system behaviors and recommend improvements.
- Participate in production debugging and incident troubleshooting using telemetry data.
- Mentor junior engineers on log management, event correlation, distributed tracing, and alert management.
- Stay current on observability innovations and recommend adoption strategies aligned with organizational goals.
- Support post-incident reviews and continuous improvement through data-driven root cause analysis.
- Drive continuous improvement in reliability and operational excellence through proactive observability initiatives.
Requirements
- 6+ years of experience in software or systems engineering, with at least 3 years focused on observability or SRE practices.
- Hands-on experience with observability tools such as Honeycomb, VictoriaMetrics, Lightstep, Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, or New Relic.
- Strong knowledge of OpenTelemetry instrumentation (metrics, traces, logs) and SLIs/SLOs for reliability tracking.
- Experience with distributed tracing, event correlation, and noise reduction frameworks.
- Proficiency in one or more programming/scripting languages such as Python, Java, Kotlin, Go, or Shell.
- Working knowledge of Infrastructure as Code (Terraform) and CI/CD (Jenkins, Github Actions,...) pipelines.
- Familiarity with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
- Strong analytical, troubleshooting, and communication skills with the ability to work effectively across teams.
- Experience conducting observability gap assessments and defining improvement plans.
- Experience working in complex or multi-cloud environments is preferred.
Benefits
- Join the Global Cognite Community
- Diverse, global team of 70+ nationalities
- Modern Bengaluru hub in a hybrid, high-trust environment with a flat structure and direct access to decision-makers.
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
observabilitySRE practicesOpenTelemetry instrumentationdistributed tracingevent correlationnoise reduction frameworksInfrastructure as CodeCI/CD pipelinesprogramming languagesdata analysis
Soft skills
analytical skillstroubleshooting skillscommunication skillsmentoringcollaborationcontinuous improvementleadershipstakeholder engagementproblem-solvingadaptability