Stellar Cyber

Senior/Staff Site Reliability Engineer

Stellar Cyber

full-time

Posted on:

Origin:  • 🇺🇸 United States • Florida, Massachusetts, New York, North Carolina

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSAzureCloudCyber SecurityDistributed SystemsElasticSearchGoogle Cloud PlatformGrafanaKafkaKubernetesLinuxMongoDBPrometheusPythonRedisSparkTerraform

About the role

  • Administer and maintain container orchestration platforms and containerized workloads.
  • Monitor and troubleshoot production systems, participating in on-call rotations to ensure reliability.
  • Drive observability improvements by enhancing monitoring, logging, and alerting capabilities across systems and data platforms.
  • Administer and optimize cloud-based environments across multiple providers.
  • Manage and support distributed data platforms and real-time processing systems.
  • Develop and maintain continuous integration and delivery pipelines for efficient and reliable deployments.
  • Own and implement Infrastructure as Code (IaC) practices to ensure consistency and scalability.
  • Automate and orchestrate infrastructure using programming and scripting languages.
  • Perform system administration and networking tasks to support internal and external environments.
  • Collaborate effectively with engineers and stakeholders across different time zones.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
  • Proven success leading large-scale production systems in cloud environments (AWS, GCP, Azure, or OCI).
  • Demonstrated leadership in driving incident response, on-call best practices, and reliability-focused culture.
  • Strong experience with production on-call operations and incident management.
  • Advanced proficiency in Kubernetes administration and troubleshooting.
  • Hands-on experience with observability tools: Prometheus, Grafana, Loki, and Alertmanager.
  • Knowledge in chat-based operations interfaces and/or auto-remediation controllers using AI agentic framework.
  • Understanding of AI agents for Auto-triaging alerts, correlate signals and suggest/root-cause hypotheses
  • Expertise in operating data platforms (Elasticsearch, MongoDB, Spark, Kafka, Redis).
  • Proficiency with public cloud services (AWS, Azure, GCP, or OCI).
  • Strong programming and automation skills in Python and Bash.
  • Deep understanding of Infrastructure as Code (Terraform, Helm).
  • Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD).
  • Strong technical background in distributed systems, databases, networking, and Linux administration.
  • Excellent problem-solving, communication, and leadership abilities.
  • Bachelor's degree in Computer Science, Engineering, or a related technical field.
  • Certifications in AWS, GCP, Observability, Linux or Kubernetes are a plus.
Transaction Network Services (TNS)

Observability Engineer

Transaction Network Services (TNS)
Mid · Seniorfull-time🇮🇳 India
Posted: 9 days agoSource: tnsi.wd1.myworkdayjobs.com
AWSAzureCloudGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonSplunk
People Culture Talent

Senior Platform Engineer

People Culture Talent
Seniorfull-time🇨🇦 Canada
Posted: 13 days agoSource: jobs.ashbyhq.com
AWSAzureBigQueryCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesPrometheus+3 more
Splunk

Manager, SRE, FedRAMP

Splunk
Senior · Leadfull-time$140k–$192k / yearIllinois · 🇺🇸 United States
Posted: 23 days agoSource: jobs.jobvite.com
ApacheAWSCassandraCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaJenkinsKafkaKubernetesMicroservices+9 more
Ada

Staff DevOps Engineer

Ada
Leadfull-time🇨🇦 Canada
Posted: 12 days agoSource: boards.greenhouse.io
AWSAzureCloudGoogle Cloud PlatformKubernetesMongoDBPostgresPythonRedisSparkTerraform
AKASA

Senior Software Engineer, DevOps

AKASA
Seniorfull-time$180k–$220k / yearCalifornia · 🇺🇸 United States
Posted: 12 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDNSGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPythonTCP/IPTerraform