Senior DevOps Engineer/Site Reliability Engineer

Stellar Cyber

Senior DevOps Engineer focusing on cloud infrastructure and reliability improvements for a cybersecurity leader. Collaborating globally to ensure optimal performance and scalability across production environments.

Posted 6/2/2026full-timeRemote • New York • 🇺🇸 United StatesSenior💰 $165,000 - $215,000 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

KubernetesDockerTerraformHelmCI/CDPythonGoBashLinuxKafka

Soft Skills

communicationcollaborationproblem-solvingtroubleshootingincident managementreliability engineering

Tools & Technologies

OCIAWSGCPAzurePrometheusGrafanaLokiAlertmanagerElastic StackGitHub Actions

Industry Keywords

DevOpsSREPlatform EngineeringInfrastructure as Codeobservabilitydistributed systemshigh-availabilityauto-remediationoperational intelligencedata platforms

Tech Stack

Tools & technologies

AWSAzureCloudDistributed SystemsDockerElasticSearchGoGoogle Cloud PlatformGrafanaKafkaKubernetesLinuxMongoDBPrometheusPythonRedisSparkTerraform

About the role

Key responsibilities & impact

Administer and maintain Kubernetes clusters and containerized workloads.
Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
Develop and maintain CI/CD pipelines for reliable application deployments.
Implement and manage Infrastructure as Code (IaC) using Terraform and Helm.
Build automation tooling and operational workflows using Python, Go, or Bash.
Drive observability initiatives including monitoring, logging, tracing, and alerting improvements.
Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations.
Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB.
Improve platform reliability, scalability, and operational efficiency using SRE best practices.
Collaborate with cross-functional teams across multiple time zones.
Perform Linux system administration and networking troubleshooting.
Contribute to incident response processes, postmortems, and reliability improvements.
Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions.
Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

Requirements

What you’ll need

5+ years of experience in DevOps, SRE, or Platform Engineering roles.
Strong expertise with Kubernetes, Docker, and container orchestration.
Hands-on experience managing production cloud environments.
Strong Infrastructure as Code experience with Terraform and Helm.
Experience with CI/CD tools and deployment automation.
Advanced troubleshooting skills in Linux systems, networking, and distributed systems.
Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.
Strong programming and scripting skills in Python, Bash, or Go.
Experience supporting high-availability production systems and on-call operations.
Knowledge of incident management and reliability engineering practices.
Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB.
Understanding of AI-driven operational tooling and automated remediation concepts.
Excellent communication, collaboration, and problem-solving skills.
Resides on the East Coast

Benefits

Comp & perks

Pre-IPO Stock Options
Medical, Dental & Vision care
401(k)
Employee Assistance Program
Employee Discount Program
Life Insurance
Paid time off
Referral Program
Rewards and Recognition Program