ELSA, Corp

Principal DevOps, SRE Engineer

ELSA, Corp

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Own the SRE practice: define severity tiers (P1–P4), formalize on-call rotation, build SLA tracking dashboards, and establish incident management workflows across a team of 4 DevOps engineers.
  • Build runbooks for the top recurring operational issues — pod scaling, deploy rollbacks, access management, EKS upgrades, CI/CD pipeline failures — and automate L1/L2 responses using tools like Shoreline.io , Rundeck, or PagerDuty automation.
  • Introduce and operationalize AI-assisted DevOps tooling: AIOps for alert correlation, CastAI/Kubecost for cost optimization, GitHub Copilot for IaC acceleration. Train the existing team on these tools.
  • Drive infrastructure modernization: EKS upgrades, Karpenter migration, observability (SigNoz/Prometheus), secrets management (ArgoCD/SOPS), and Terraform-based IaC maturity.
  • Collaborate with AI Engineering, Mobile, and B2B teams to ensure infrastructure supports real-time speech processing, GPU workloads, and multi-region enterprise deployments.
  • Design and plan round-the-clock SRE coverage model as B2B enterprise SLA commitments grow — evaluate vendor partnerships or strategic hires for Americas timezone coverage.

Requirements

  • 2+ years in DevOps/SRE, with at least 2 years in a principal or staff-level role owning reliability practices for a production SaaS product.
  • Deep hands-on experience with AWS (EKS, EC2, DynamoDB, S3, IAM, Secrets Manager), Kubernetes (HPA, KEDA, Karpenter, pod scheduling, GPU workloads), and IaC (Terraform, Helm, ArgoCD).
  • Track record of building runbooks, on-call rotations, and incident management frameworks — not just participating in them.
  • Experience with observability stacks (Prometheus, Grafana, SigNoz or Datadog), CI/CD (GitLab CI, GitHub Actions), and alerting (PagerDuty, Opsgenie).
  • Comfort working across timezones with distributed teams (India, Vietnam, Portugal).
  • Strong written communication — you'll be writing runbooks, RCAs, and proposals as much as Terraform.
Benefits
  • Flexible work setup: Remote-first for Singapore, India, Indonesia, Malaysia; hybrid model for Vietnam.
  • Comprehensive employee well-being benefits.
  • Free ELSA Premium courses to polish your language skills.
  • Collaborative, international team culture.
  • Opportunity to contribute to a fast-growing, well-funded Silicon Valley startup with global impact.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
DevOpsSREAWSKubernetesIaCTerraformGitLab CIGitHub Actionsobservabilityincident management
Soft Skills
written communicationcollaborationteam trainingtime zone management