Teamworks

Site Reliability Engineer II

Teamworks

full-time

Posted on:

Location: 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $140,000 - $160,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudKubernetesLinuxTerraform

About the role

  • Ensure the reliability, availability, and performance of mission-critical applications and services
  • Combine software and systems engineering expertise to maintain and scale robust services across a multi-product ecosystem
  • Contribute to managing platform services that support multiple products and own specific components as you grow
  • Implement and maintain infrastructure-as-code solutions using Terraform
  • Work with containerized applications and Kubernetes, managing deployments and troubleshooting
  • Build and maintain CI/CD pipelines using GitLab and take ownership of workflow improvements
  • Monitor platform performance using Datadog; create alerts and dashboards
  • Participate in on-call rotations, handle incident response and platform troubleshooting
  • Collaborate with product teams to understand platform needs and contribute to self-service tooling
  • Automate routine operational tasks and contribute to platform tooling improvements
  • Support integration of new services into the platform with proper monitoring
  • Learn and apply platform engineering best practices while helping teams adopt infrastructure standards

Requirements

  • Solid experience with Terraform for infrastructure-as-code
  • Working experience with Kubernetes for container management
  • Familiarity with GitLab for CI/CD pipelines and version control
  • Experience or strong interest in Datadog for monitoring and observability
  • Strong Linux operating system knowledge and administration skills in cloud environments
  • Understanding of platform engineering concepts or a strong interest in platform services
  • Experience with AWS services and cloud-native technologies
  • Comfortable with containerization and deployment strategies
  • Understanding of incident response and on-call responsibilities
  • Experience automating infrastructure and supporting AWS environments in production
  • Demonstrable experience with infrastructure as code and CI/CD pipelines
  • Understanding of database administration and optimization techniques
  • Familiarity with networking concepts, load balancing, and CDN management
  • Experience with version control systems (Git) and collaborative development workflows
  • Knowledge of security best practices and compliance requirements
  • Experience working asynchronously with coworkers across different time zones