Site Reliability Engineer II

Balto

full-time

Posted on: 8/19/2025

Origin: • 🇺🇸 United States

✨ AI Apply

Mid-LevelSenior

AWSCloudCyber SecurityDNSFirewallsGoGrafanaJenkinsKubernetesPrometheusPythonTCP/IPTerraform

About the role

Infrastructure Management: Architect, build, and scale AWS infrastructure using Infrastructure as Code (IaC) tools such as Terraform. CI/CD & Deployment: Design, implement, and optimize CI/CD pipelines using tools like GitHub Actions, ArgoCD, or similar to streamline deployments and improve release velocity. Kubernetes Operations: Manage and optimize Kubernetes-based infrastructure (Amazon EKS) to ensure scalability, reliability, and efficient resource utilization. Observability & Incident Response: Build and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, Datadog, Loki) to ensure high availability; participate in the on-call rotation to resolve incidents. Security & Compliance: Implement and maintain security controls to meet PCI DSS, HIPAA, GDPR, and SOC 2 standards, and support audit readiness. System Architecture: Contribute to designing fault-tolerant architectures with disaster recovery and high-availability strategies within and out of the CDE environments. Developer Enablement: Partner with developers to improve deployment workflows, reduce lead time for changes, and provide platform tooling support. Documentation & Knowledge Sharing: Create clear runbooks, technical documentation, and knowledge base articles to support team-wide learning and operational excellence.

3-5 years of experience in SRE, DevOps, or Platform Engineering roles, with at least 2 years in a senior or mid-level capacity.
Strong hands-on experience with AWS services and IaC tools like Terraform.
Expertise in Kubernetes operations in production environments (Amazon EKS preferred).
Proficiency in CI/CD pipeline tools (e.g., GitHub Actions, Jenkins, ArgoCD).
Strong knowledge of monitoring and observability tooling (Prometheus, Grafana, Datadog, CloudWatch).
Familiarity with compliance frameworks (PCI DSS, HIPAA, GDPR, SOC 2) and cloud security best practices.
Excellent problem-solving, troubleshooting, and incident management skills.
Preferred: Experience supporting developers in platform engineering or internal tooling contexts.
Familiarity with NIST Cybersecurity Framework (CSF) implementation in SaaS/cloud environments.
Strong networking fundamentals (TCP/IP, DNS, HTTP, TLS, firewalls).
Experience with AWS networking services (VPC, Route 53, NAT Gateway, ALB/NLB).
Background in cost optimization and cloud governance.
Strong scripting/programming skills (Bash, Python, Go).