NBCUniversal

Site Reliability Engineer 3

NBCUniversal

full-time

Posted on:

Origin:  • 🇺🇸 United States • Illinois, Virginia

Visit company website
AI Apply
Apply

Salary

💰 $99,601 - $149,401 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSAzureCassandraCloudDockerGoGoogle Cloud PlatformGrafanaHadoopHDFSJavaKafkaKubernetesMySQLNoSQLPostgresPrometheusPythonScalaSparkTerraform

About the role

  • Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms.
  • Join on-call shift to quickly respond to and resolve issues.
  • Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery.
  • Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, and improve processing speed.
  • Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability.
  • Work with engineering teams to analyze and forecast capacity requirements and scale infrastructure accordingly.
  • Support Freewheel powered Live events.
  • Document the architecture, configurations, and operational procedures for platforms and provide relevant training.
  • Ensure platforms meet security standards and compliance requirements.
  • Collaborate with engineering, product, and project management teams to support product design and implementation and solve reliability-related issues.

Requirements

  • At least 3 years of experience as an SRE, DevOps or Operations Engineer.
  • Relevant Work Experience 5-7 Years.
  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure).
  • Hands-on experience with Terraform and infrastructure as code principle.
  • Proficiency in automation tools and frameworks (e.g. Ansible, Terraform, Kubernetes, Docker).
  • Familiarity with modern data architectures and technologies, including big data platforms (e.g. Kafka, Hadoop, Spark) and distributed storage (e.g. Cassandra, HDFS, AWS S3).
  • Extensive experience in data base management (e.g. NoSQL databases, MySQL, PostgreSQL).
  • Proficient in at least one programming language such as Python, Go, Java, or Scala.
  • Familiar with monitoring and log management tools such as Prometheus, Grafana, ELK Stack.
  • Strong debugging and troubleshooting skills with ability to quickly identify and resolve production issues.
  • Excellent communication skills; ability to convey technical information clearly to technical and non-technical stakeholders.
  • Proactive learner eager to grow in operations and governance.
  • Bachelor’s degree or higher in Computer Science, Software Engineering, or a related field.
  • Willingness to join on-call shifts and support FreeWheel powered Live events.
NBCUniversal

Site Reliability Engineer 2

NBCUniversal
Juniorfull-time$84k–$127k / yearIllinois, Virginia · 🇺🇸 United States
Posted: 2 hours agoSource: comcast.wd5.myworkdayjobs.com
AnsibleAWSAzureCassandraCloudDockerGoGoogle Cloud PlatformGrafanaHadoopHDFSJava+10 more
MRSOOL | مرسول

Senior Site Reliability Engineer

MRSOOL | مرسول
Seniorfull-time🇪🇬 Egypt
Posted: 6 days agoSource: apply.workable.com
AnsibleAWSAzureChefCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaJavaKubernetes+5 more
Cprime, Inc

Automation Engineer

Cprime, Inc
Mid · Seniorfull-time🇮🇳 India
Posted: 25 days agoSource: jobs.lever.co
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaGraphQLITSMJenkinsKubernetesPrometheus+5 more
Istari

Senior Solutions Infrastructure Engineer

Istari
Seniorfull-time$135k–$220k / year🇺🇸 United States
Posted: 6 days agoSource: jobs.lever.co
AnsibleAWSAzureCloudGoogle Cloud PlatformKubernetesPostgresTerraform
Cummins Inc.

Senior Platform Engineer

Cummins Inc.
Seniorfull-time🇮🇳 India
Posted: 24 days agoSource: fa-espx-saasfaprod1.fa.ocs.oraclecloud.com
AnsibleAWSAzureChefCloudDockerGoKubernetesLinuxOraclePrometheusPuppet+4 more