
Principal DevOps Engineer
Sunbytes
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇻🇳 Vietnam
Visit company websiteJob Level
Lead
Tech Stack
AWSAzureCloudFluxGoogle Cloud PlatformKubernetesNode.jsPrometheusPythonTerraform
About the role
- Design and operate secure, scalable, and high-quality infrastructure that supports modern applications and advanced AI workloads.
- Build and maintain robust automation across CI/CD pipelines, infrastructure provisioning, and operational processes to improve reliability and minimize manual effort.
- Integrate AI-driven solutions into operational workflows to enhance efficiency, detect anomalies, and accelerate delivery.
- Apply strong systems engineering practices, including monitoring, incident management, performance optimization, and capacity planning.
- Establish and uphold DevOps best practices, ensuring reproducibility, testing, documentation, and operational excellence.
- Communicate technical decisions clearly and collaborate cross-functionally to support predictable delivery and effective problem-solving.
- Provide mentorship and technical leadership, raising the level of platform engineering, DevOps maturity, and overall engineering quality across the organization.
Requirements
- 8+ years of progressive experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering.
- Hands-on experience across multi-cloud environments (AWS, GCP, Azure), with strong knowledge of networking, compute, storage, security, and cost optimization.
- Deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS) and Infrastructure as Code (e.g., Terraform, Pulumi, CloudFormation).
- Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads).
- Proven ability to design and operate highly available production systems using zero-downtime deployment strategies.
- Experience with GitOps practices (e.g., ArgoCD, Flux) and building self-service developer platforms.
- Experience managing multi-cloud API gateways and edge routing solutions (e.g., Kong, Traefik, Cloudflare).
- Strong background in platform security, including IAM, secrets management, and runtime hardening with tools like Falco/eBPF.
- Practical experience with modern observability stacks (e.g., Prometheus, OpenTelemetry, OpenSearch, ELK).
- Familiarity with modern programming languages such as NodeJS, NestJS, and Python is a strong plus.
Benefits
- Competitive salary
- 12 days annual leave (increase with seniority)
- Year-end performance bonus
- All insurance covered as Vietnamese law
- Additional private health insurance for enhanced medical coverage
- Remote work, flexibility
- Opportunity to work on complex system and international environment
- Opportunity to work on cutting-edge AI-first solutions, with professional growth in a leadership-focused role.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
DevOpsSREPlatform EngineeringInfrastructure EngineeringcontainerizationorchestrationInfrastructure as CodeAI/ML workloadszero-downtime deploymentGitOps
Soft skills
technical leadershipmentorshipcollaborationproblem-solvingcommunication