DevOps Engineer – Platform Reliability

BJAK

DevOps Engineer responsible for platform reliability, infrastructure stability, and operational resilience for BJAK’s AI automation systems. Collaborate closely with teams across Southeast Asia.

Posted 6/27/2026full-timeRemote • 🇨🇳 ChinaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

About the role

Key responsibilities & impact

Own and improve platform reliability across production systems and environments.
Manage cloud infrastructure, deployment pipelines and runtime environments.
Design and improve CI/CD workflows to enable safe, fast and repeatable releases.
Build and enhance monitoring, alerting, logging and system observability.
Lead incident response efforts and perform structured root cause analysis.
Improve system resilience through redundancy, failover and recovery mechanisms.
Work with engineering teams to reduce production risk through better deployment and system design practices.
Strengthen infrastructure security, access control and secrets management.
Support reliability for business-critical workflows across multiple countries and services.
Continuously improve operational discipline, uptime and system stability.

Requirements

What you’ll need

Experience in DevOps, SRE, platform engineering or infrastructure-focused roles.
Strong understanding of cloud infrastructure, CI/CD pipelines and deployment systems.
Experience with production monitoring, alerting and incident management practices.
Ability to troubleshoot infrastructure and production issues in a structured and calm manner.
Strong understanding of reliability engineering principles (availability, fault tolerance, recovery).
Experience supporting business-critical or high-availability systems.
Strong ownership mindset during incidents and operational failures.
Practical judgment on reliability, performance, security and cost trade-offs.
Comfortable working closely with engineering teams in fast-paced environments.
Low ego, disciplined and focused on long-term system stability.
Bonus Points: Experience with AWS, GCP, Azure or similar cloud platforms.
Experience with Kubernetes, Docker or container orchestration.
Experience with infrastructure-as-code tools (Terraform, Ansible, Pulumi, etc.).
Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
Experience with zero-downtime deployments, blue-green or canary release strategies.
Experience supporting distributed or high-traffic production systems.
Strong knowledge of security best practices in cloud infrastructure.
Experience in fintech, insurance or regulated industry environments.
Contributions to platform reliability or infrastructure scaling initiatives.

Benefits

Comp & perks

Build Reliable AI Platform Infrastructure – Support systems powering end-to-end insurance automation.
High-Impact Engineering – Solve real-world reliability and scaling challenges.
Global Engineering Team – Work with experienced engineers across multiple countries.
Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams.
International Exposure – Build systems used across Southeast Asia markets.
Learning & Development Budget – Support continuous technical growth and certifications.
High Ownership Environment – Strong autonomy over infrastructure and reliability strategy.
Modern Engineering Culture – Focus on stability, observability and engineering excellence.
Competitive Compensation – Attractive salary package based on experience and impact.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

DevOpsSite Reliability EngineeringPlatform EngineeringCloud InfrastructureCI/CD PipelinesProduction MonitoringIncident ManagementReliability Engineering PrinciplesInfrastructure-as-CodeObservability Stacks

Soft Skills

TroubleshootingOwnership MindsetPractical JudgmentCollaborationDisciplineFocus on Long-Term StabilityCalm Under PressureLow EgoOperational DisciplineStructured Problem Solving