Balto

Site Reliability Engineer II

Balto

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudCyber SecurityDNSFirewallsGoGrafanaJenkinsKubernetesPrometheusPythonTCP/IPTerraform

About the role

  • Infrastructure Management: Architect, build, and scale AWS infrastructure using Infrastructure as Code (IaC) tools such as Terraform. CI/CD & Deployment: Design, implement, and optimize CI/CD pipelines using tools like GitHub Actions, ArgoCD, or similar to streamline deployments and improve release velocity. Kubernetes Operations: Manage and optimize Kubernetes-based infrastructure (Amazon EKS) to ensure scalability, reliability, and efficient resource utilization. Observability & Incident Response: Build and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, Datadog, Loki) to ensure high availability; participate in the on-call rotation to resolve incidents. Security & Compliance: Implement and maintain security controls to meet PCI DSS, HIPAA, GDPR, and SOC 2 standards, and support audit readiness. System Architecture: Contribute to designing fault-tolerant architectures with disaster recovery and high-availability strategies within and out of the CDE environments. Developer Enablement: Partner with developers to improve deployment workflows, reduce lead time for changes, and provide platform tooling support. Documentation & Knowledge Sharing: Create clear runbooks, technical documentation, and knowledge base articles to support team-wide learning and operational excellence.

Requirements

  • 3-5 years of experience in SRE, DevOps, or Platform Engineering roles, with at least 2 years in a senior or mid-level capacity.
  • Strong hands-on experience with AWS services and IaC tools like Terraform.
  • Expertise in Kubernetes operations in production environments (Amazon EKS preferred).
  • Proficiency in CI/CD pipeline tools (e.g., GitHub Actions, Jenkins, ArgoCD).
  • Strong knowledge of monitoring and observability tooling (Prometheus, Grafana, Datadog, CloudWatch).
  • Familiarity with compliance frameworks (PCI DSS, HIPAA, GDPR, SOC 2) and cloud security best practices.
  • Excellent problem-solving, troubleshooting, and incident management skills.
  • Preferred: Experience supporting developers in platform engineering or internal tooling contexts.
  • Familiarity with NIST Cybersecurity Framework (CSF) implementation in SaaS/cloud environments.
  • Strong networking fundamentals (TCP/IP, DNS, HTTP, TLS, firewalls).
  • Experience with AWS networking services (VPC, Route 53, NAT Gateway, ALB/NLB).
  • Background in cost optimization and cloud governance.
  • Strong scripting/programming skills (Bash, Python, Go).