DistroKid

Senior Systems Operations Engineer

DistroKid

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $155,000 - $170,000 per year

Job Level

About the role

  • Design, deploy, and manage scalable and highly available cloud infrastructure on AWS, with deep expertise in core services (EC2, EKS, S3, RDS, IAM, VPC, and beyond).
  • Develop and maintain disaster recovery plans leveraging AWS capabilities for backup and replication to ensure business continuity.
  • Collaborate with engineering and security teams to improve infrastructure health, security, and long-term scalability.
  • Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards; implement module versioning and lifecycle strategies.
  • Direct the migration of manual infrastructure to code; establish patterns and best practices for IaC adoption across the team.
  • Implement IaC testing strategies, including validation, linting, and integration testing, using tools such as Terraform-Compliance or Checkov.
  • Architect and maintain complex Bitbucket pipeline configurations for multi-environment IaC deployments; implement pipeline security best practices.
  • Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting.
  • Use AI-assisted development and operations tools (e.g., Cursor, Claude) to accelerate troubleshooting, code review, and documentation generation.
  • Evaluate and implement AI-powered automation to reduce operational toil, improve repeatability, and scale platform capabilities.
  • Define and implement SLOs for services; guide and/or participate in incident response and conduct blameless postmortems.
  • Implement chaos engineering practices to proactively identify system weaknesses before they impact production.
  • Build and maintain comprehensive monitoring solutions using tools such as CloudWatch and Datadog to track performance and drive optimization.
  • Develop automation scripts and tools in Python, Bash, or similar languages to streamline operations and eliminate manual toil.
  • Build self-service capabilities for development teams to reduce cognitive load and enable developer autonomy across the organization.
  • Guide the solution architecture and end-to-end implementation of DistroKid’s first Internal Developer Portal (IDP).
  • Define the IDP roadmap and success criteria in partnership with engineering leadership; establish golden paths, service catalogs, and self-service workflows that reduce deployment friction and accelerate developer productivity.
  • Drive adoption of the IDP across engineering teams; gather feedback, iterate on the platform, and measure impact through developer experience metrics and reduced time-to-deploy.
  • Guide cost optimization initiatives; implement rightsizing recommendations, reserved-capacity strategies, and tagging standards for cost allocation.
  • Monitor and optimize AWS resource usage; select appropriate services and configurations to meet performance requirements cost-effectively.
  • Direct planning, decision-making, and execution for infrastructure projects; own workstreams end-to-end.
  • Partner cross-functionally with engineering, security, and product teams; communicate impact in terms of company strategy and OKRs.
  • Provide technical mentorship to junior and mid-level engineers; invest in team growth and foster a culture of continuous learning.
  • Maintain and contribute to infrastructure documentation, runbooks, and architectural decision records to ensure knowledge sharing and operational consistency.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, a related field, or equivalent practical experience.
  • 5+ years of experience in systems operations, platform engineering, or DevOps with a focus on cloud infrastructure and containerized environments.
  • Proven production experience with AWS services (EC2, EKS, S3, RDS, IAM, VPC, API Gateway, Event Bridge, etc) and Kubernetes.
  • 5+ years of hands-on experience with Infrastructure as Code tools, specifically Terraform and/or OpenTofu, including module design, state management, remote backends, and IaC testing.
  • Strong knowledge of Linux/Unix administration, systems, and shell scripting.
  • Proficiency in Python, Go, or similar programming languages.
  • Experience with CI/CD pipelines for infrastructure deployments (Bitbucket Pipelines, Jenkins, or similar).
  • Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch, or Datadog).
  • Demonstrated experience implementing or working with AIOps tools, practices, or AI-assisted operations in a professional context.
  • Experience using AI-assisted development tools (e.g., Cursor, Warp, Claude, or similar) to accelerate engineering work.
Benefits
  • Retirement plans (401k, SIPP, etc.)
  • Health insurance
  • Generous paid time off
  • Parental leave
  • Home office allowance
  • Flexible work schedules
  • Paid and discounted subscriptions
  • Regular engagement activities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSTerraformOpenTofuIaCPythonBashKubernetesLinux/Unix administrationCI/CDAIOps
Soft Skills
collaborationcommunicationtechnical mentorshipproblem-solvingcontinuous learningproject managementcost optimizationfeedback gatheringteam growthdeveloper autonomy
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Information Technologyrelated field degree