Chainlink Labs

Senior Site Reliability Engineer, Observability

Chainlink Labs

full-time

Posted on:

Location: 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSDistributed SystemsGoGrafanaJavaKubernetesOraclePackerPerlPrometheusPythonRubySplunkSwiftTerraformWeb3

About the role

  • Build and orchestrate Modern OTEL-based Observability Platform
  • Support multiple telemetry types, including metrics, logs and traces
  • Define and support modern governance in observability and problems at scale
  • Ensure reliability, security, and performance exceed defined SLAs
  • Work with engineers across the company to troubleshoot issues, deploy new products and services, increase velocity and decrease cognitive load
  • Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action
  • Ingest, aggregate, transform, and utilize data from multiple sources in the real time data pipeline
  • Oversee the availability, performance, and supportability of observability infrastructure
  • Create processes around alert response operations to ensure reliable delivery of oracle data
  • Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release
  • Champion reliability and security by doing work correctly the first time

Requirements

  • 7+ years of relevant professional experience (devops, infrastructure, SRE, and/or platform teams)
  • Ability to develop software outside of the scope of typical infrastructure requirements and configurations
  • Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
  • Expert knowledge in all aspects of designing, developing, and managing large real-time systems
  • Experience with monitoring and logging: exporting metrics using Prometheus, building Grafana dashboards, centralized logging solutions (ELK Stack, Splunk, Grafana Stack)
  • Experience with distributed systems and container orchestration; maintained or built Kubernetes clusters and deployed new services on them
  • Strong communication skills; able to give and receive constructive feedback and participate in planning meetings and code reviews
  • Excitement for blockchain, Web 3.0, and decentralized technologies (desired)
  • Experience running infrastructure in the blockchain/web3 space (desired)
  • Ability to scale systems sustainably through automation and advocate for reliability and velocity improvements (desired)
  • Experience working remotely in a distributed team (desired)
  • Strong desire to grow, improve, and automate services to reduce toil (desired)
  • Familiarity/proficiency with tools: AWS; Terraform/Terragrunt; Kubernetes, Calico, ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
Coates Group

Senior DevOps Engineer

Coates Group
Seniorfull-time$125k–$140k / yearIllinois · 🇺🇸 United States
Posted: 3 hours agoSource: jobs.lever.co
AWSCloudDockerIoTLinuxMicroservicesPython
Eduphoria! Inc.

AWS DevOps Engineer

Eduphoria! Inc.
Mid · Seniorfull-time$110k–$125k / yearFlorida, Illinois, Kansas, Maryland, North Carolina, Ohio, Tennessee, Texas, Virginia · 🇺🇸 United States
Posted: 17 hours agoSource: eduphoria.applytojob.com
AWSAzureCloudEC2LinuxMySQL.NETSQLTerraform
GEICO

DevOps Engineer II – FinTech Commissions, Substantiation

GEICO
Mid · Seniorfull-time$75k–$160k / yearDistrict of Columbia, Maryland, Texas, Virginia · 🇺🇸 United States
Posted: 18 hours agoSource: geico.wd1.myworkdayjobs.com
AWSAzureCloudDistributed SystemsJava.NETNoSQLPythonSQL
ParentSquare

Site Reliability Engineer

ParentSquare
Mid · Seniorfull-time$170k–$200k / year🇺🇸 United States
Posted: 18 hours agoSource: ats.rippling.com
AnsibleAWSAzureChefCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheus+4 more
Leidos

DevOps Technical Lead

Leidos
Seniorfull-time$105k–$189k / year🇺🇸 United States
Posted: 19 hours agoSource: leidos.wd5.myworkdayjobs.com
AWSCloudGrafanaJenkinsJMeterKafkaLinuxMavenSeleniumSplunkZookeeper