Sunbytes

Principal DevOps Engineer

Sunbytes

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇻🇳 Vietnam

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

AWSAzureCloudFluxGoogle Cloud PlatformKubernetesNode.jsPrometheusPythonTerraform

About the role

  • Design and operate secure, scalable, and high-quality infrastructure that supports modern applications and advanced AI workloads.
  • Build and maintain robust automation across CI/CD pipelines, infrastructure provisioning, and operational processes to improve reliability and minimize manual effort.
  • Integrate AI-driven solutions into operational workflows to enhance efficiency, detect anomalies, and accelerate delivery.
  • Apply strong systems engineering practices, including monitoring, incident management, performance optimization, and capacity planning.
  • Establish and uphold DevOps best practices, ensuring reproducibility, testing, documentation, and operational excellence.
  • Communicate technical decisions clearly and collaborate cross-functionally to support predictable delivery and effective problem-solving.
  • Provide mentorship and technical leadership, raising the level of platform engineering, DevOps maturity, and overall engineering quality across the organization.

Requirements

  • 8+ years of progressive experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering.
  • Hands-on experience across multi-cloud environments (AWS, GCP, Azure), with strong knowledge of networking, compute, storage, security, and cost optimization.
  • Deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS) and Infrastructure as Code (e.g., Terraform, Pulumi, CloudFormation).
  • Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads).
  • Proven ability to design and operate highly available production systems using zero-downtime deployment strategies.
  • Experience with GitOps practices (e.g., ArgoCD, Flux) and building self-service developer platforms.
  • Experience managing multi-cloud API gateways and edge routing solutions (e.g., Kong, Traefik, Cloudflare).
  • Strong background in platform security, including IAM, secrets management, and runtime hardening with tools like Falco/eBPF.
  • Practical experience with modern observability stacks (e.g., Prometheus, OpenTelemetry, OpenSearch, ELK).
  • Familiarity with modern programming languages such as NodeJS, NestJS, and Python is a strong plus.
Benefits
  • Competitive salary
  • 12 days annual leave (increase with seniority)
  • Year-end performance bonus
  • All insurance covered as Vietnamese law
  • Additional private health insurance for enhanced medical coverage
  • Remote work, flexibility
  • Opportunity to work on complex system and international environment
  • Opportunity to work on cutting-edge AI-first solutions, with professional growth in a leadership-focused role.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
DevOpsSREPlatform EngineeringInfrastructure EngineeringcontainerizationorchestrationInfrastructure as CodeAI/ML workloadszero-downtime deploymentGitOps
Soft skills
technical leadershipmentorshipcollaborationproblem-solvingcommunication