OpenCV

Senior Observability Platform Engineer

OpenCV

full-time

Posted on:

Origin:  • 🇨🇭 Switzerland

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

DockerGoKubernetesLinuxOpen Source

About the role

  • Configure, operate, and enhance observability platforms and frameworks (Clickhouse, Thanos, Loki, Tempo, OpenTelemetry Collector + custom processors)
  • Manage and evolve core observability infrastructure supporting engineering teams and a customer-facing portal
  • Handle telemetry at scale (more than 20 TB per day from 10,000+ nodes including Linux hosts, k8s clusters, VMs)
  • Drive organization-wide adoption of observability best-practices for monitoring, logging, and tracing
  • Develop and maintain automated solutions for monitoring, alerting, and incident response
  • Collaborate with engineering teams to provide scalable observability solutions and understand their needs
  • Optimize system performance, ensure high availability, and perform capacity planning and cost optimization
  • Experiment with and integrate new observability tools and OpenTelemetry Collector to enhance telemetry collection and analysis

Requirements

  • Proven track record managing observability stacks (Thanos, Mimir, Cortex, Tempo, Loki, Clickhouse)
  • Deep understanding of Kubernetes architecture and hands-on cluster management
  • Experience writing and maintaining Helm charts
  • Experience with GitOps, CI/CD and continuous delivery practices
  • Expertise in Docker containerization and orchestration
  • Proficiency in Linux system administration, scripting and automation
  • 5+ years of experience in platform engineering, site reliability engineering, or a related role
  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience)
  • Demonstrated experience managing large-scale infrastructures and observability platforms
  • Coding experience in Golang or similar language (desirable)
  • Open source contributions in Golang or similar language (desirable)
  • Knowledge or contribution to OpenTelemetry Collector (desirable)
  • Strong communication skills and ability to convey technical concepts to non-technical stakeholders
  • Quick learner and collaborative mindset
  • Customer-focused approach
Red Hat

Senior Director, GTM Data Process and Transformation

Red Hat
Seniorfull-time$232k–$394k / year🇺🇸 United States
Posted: 2 days agoSource: redhat.wd5.myworkdayjobs.com
CloudGoKubernetesLinuxOpen Source
Cognits

SRE / DevOps – Tooling, Bazel

Cognits
Mid · Seniorfull-time🇺🇸 United States
Posted: 29 days agoSource: cognits.bamboohr.com
Distributed SystemsDockerGoGradleJavaKubernetesOpen SourcePythonRust
OpenX

Test Automation Engineer III, Java

OpenX
Mid · Seniorfull-time$17k–$19k🇵🇱 Poland
Posted: 2 days agoSource: jobs.lever.co
CloudDockerGoogle Cloud PlatformJavaJenkinsKafkaKubernetesSparkSpinnakerSQL
The Walt Disney Company

Senior Machine Learning Engineer

The Walt Disney Company
Seniorfull-time$139k–$204k / yearCalifornia, New York, Washington · 🇺🇸 United States
Posted: 32 days agoSource: disney.wd5.myworkdayjobs.com
AirflowAWSCloudDockerETLJenkinsKafkaPythonScalaSpark
Two Six Technologies

AI/ML Data Pipeline Engineer

Two Six Technologies
Mid · Seniorfull-time$130k–$175k / year🇺🇸 United States
Posted: 18 days agoSource: boards.greenhouse.io
AWSCloudDockerElasticSearchKubernetesLinuxNoSQLPostgresPythonSQL