AKASA

Senior Software Engineer, DevOps

AKASA

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $180,000 - $220,000 per year

Job Level

Senior

Tech Stack

AWSAzureCloudDNSGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPythonTCP/IPTerraform

About the role

  • Ensure infrastructure is reliable, observable, and easy to operate with emphasis on automation and operational excellence.
  • Build, manage, and optimize infrastructure using Terraform, GitHub CI/CD, and Kubernetes.
  • Create visualizations and alerts that provide actionable insights using tools like Grafana, Prometheus/Mimir, OpenSearch, and Sentry.
  • Identify manual or error-prone processes and replace them with automated, repeatable systems.
  • Diagnose and resolve production issues across application and infrastructure layers.
  • Capture knowledge in runbooks, setup guides, and architecture diagrams to support operational maturity.
  • Partner with engineers across teams to drive adoption of DevOps and infrastructure best practices.
  • Help scale infrastructure and monitoring systems to meet growing demands.
  • Participate in an on-call rotation and support incident response processes as needed.
  • Attend weekly co-working days in the South San Francisco office (expected on Wednesdays).

Requirements

  • Experience with metrics, logs, and traces using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, or similar.
  • Proficient with Terraform, Kubernetes, and containerization tools.
  • 5+ years of experience with Python.
  • Comfortable working with Linux-based environments and writing shell scripts.
  • Strong collaboration skills with a focus on asynchronous, written communication.
  • Commitment to clear, comprehensive documentation and process standardization.
  • Self-starter mindset with a proactive approach to solving operational challenges.
  • Skilled in Git/GitHub-based workflows.
  • Willingness to participate in an on-call rotation and support incident response processes.
  • Nice-to-have: AWS (preferred), GCP, or Azure cloud infrastructure management.
  • Nice-to-have: Familiarity with TCP/IP, DNS, routing, and load balancing concepts.
  • Nice-to-have: Understanding of cloud and infrastructure security best practices.
  • Nice-to-have: Experience tuning application or infrastructure performance in production environments.