Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
BJAK

DevOps Engineer – Platform Reliability

BJAK

DevOps Engineer responsible for platform reliability, infrastructure stability, and operational resilience for BJAK’s AI automation systems. Collaborate closely with teams across Southeast Asia.

Posted 6/27/2026full-timeRemote • 🇨🇳 ChinaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

About the role

Key responsibilities & impact
  • Own and improve platform reliability across production systems and environments.
  • Manage cloud infrastructure, deployment pipelines and runtime environments.
  • Design and improve CI/CD workflows to enable safe, fast and repeatable releases.
  • Build and enhance monitoring, alerting, logging and system observability.
  • Lead incident response efforts and perform structured root cause analysis.
  • Improve system resilience through redundancy, failover and recovery mechanisms.
  • Work with engineering teams to reduce production risk through better deployment and system design practices.
  • Strengthen infrastructure security, access control and secrets management.
  • Support reliability for business-critical workflows across multiple countries and services.
  • Continuously improve operational discipline, uptime and system stability.

Requirements

What you’ll need
  • Experience in DevOps, SRE, platform engineering or infrastructure-focused roles.
  • Strong understanding of cloud infrastructure, CI/CD pipelines and deployment systems.
  • Experience with production monitoring, alerting and incident management practices.
  • Ability to troubleshoot infrastructure and production issues in a structured and calm manner.
  • Strong understanding of reliability engineering principles (availability, fault tolerance, recovery).
  • Experience supporting business-critical or high-availability systems.
  • Strong ownership mindset during incidents and operational failures.
  • Practical judgment on reliability, performance, security and cost trade-offs.
  • Comfortable working closely with engineering teams in fast-paced environments.
  • Low ego, disciplined and focused on long-term system stability.
  • Bonus Points: Experience with AWS, GCP, Azure or similar cloud platforms.
  • Experience with Kubernetes, Docker or container orchestration.
  • Experience with infrastructure-as-code tools (Terraform, Ansible, Pulumi, etc.).
  • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
  • Experience with zero-downtime deployments, blue-green or canary release strategies.
  • Experience supporting distributed or high-traffic production systems.
  • Strong knowledge of security best practices in cloud infrastructure.
  • Experience in fintech, insurance or regulated industry environments.
  • Contributions to platform reliability or infrastructure scaling initiatives.

Benefits

Comp & perks
  • Build Reliable AI Platform Infrastructure – Support systems powering end-to-end insurance automation.
  • High-Impact Engineering – Solve real-world reliability and scaling challenges.
  • Global Engineering Team – Work with experienced engineers across multiple countries.
  • Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams.
  • International Exposure – Build systems used across Southeast Asia markets.
  • Learning & Development Budget – Support continuous technical growth and certifications.
  • High Ownership Environment – Strong autonomy over infrastructure and reliability strategy.
  • Modern Engineering Culture – Focus on stability, observability and engineering excellence.
  • Competitive Compensation – Attractive salary package based on experience and impact.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
DevOpsSite Reliability EngineeringPlatform EngineeringCloud InfrastructureCI/CD PipelinesProduction MonitoringIncident ManagementReliability Engineering PrinciplesInfrastructure-as-CodeObservability Stacks
Soft Skills
TroubleshootingOwnership MindsetPractical JudgmentCollaborationDisciplineFocus on Long-Term StabilityCalm Under PressureLow EgoOperational DisciplineStructured Problem Solving