Site Reliability Engineer – Insurance Platform

BJAK

Site Reliability Engineer ensuring operational stability and reliability of BJAK’s insurance automation platform. Collaborating with engineering teams for system improvements and incident management.

Posted 6/27/2026full-timeRemote • 🇨🇳 ChinaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

About the role

Key responsibilities & impact

Own reliability and operational stability of BJAK’s production systems.
Design and improve monitoring, alerting, logging and observability across services.
Lead incident response, troubleshooting and structured root cause analysis.
Improve system resilience through redundancy, failover and recovery strategies.
Work with engineers to design systems that are reliable, scalable and operable in production.
Improve deployment safety through CI/CD pipelines, release strategies and automation.
Reduce recurring incidents by identifying root causes and driving long-term fixes.
Manage and optimize cloud infrastructure supporting business-critical workflows.
Strengthen operational practices including on-call processes, incident playbooks and SLAs.
Continuously improve system uptime, performance and operational maturity.

Requirements

What you’ll need

Experience in Site Reliability Engineering, DevOps, platform engineering or infrastructure roles.
Strong understanding of distributed systems, cloud infrastructure and production operations.
Experience with monitoring, alerting and observability tools.
Strong troubleshooting skills for production incidents and system failures.
Ability to design for reliability, scalability and fault tolerance.
Experience working with CI/CD pipelines and deployment automation.
Strong understanding of system performance, capacity planning and risk management.
Hands-on ownership mindset during incidents and operational issues.
Calm, structured and disciplined approach to production environments.
Strong collaboration with engineering teams in fast-paced environments.
Bonus Points
Experience with AWS, GCP, Azure or similar cloud platforms.
Experience with Kubernetes, Docker or container orchestration systems.
Experience with infrastructure-as-code tools (Terraform, Ansible, etc).
Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc).
Experience with incident management tools and on-call systems.
Experience with zero-downtime deployments and progressive delivery strategies.
Experience working in fintech, insurance or regulated industries.
Experience building reliability frameworks or SRE best practices in scaling systems.
Contributions to platform reliability or infrastructure resilience initiatives.

Benefits

Comp & perks

Build Reliable Insurance Systems – Support mission-critical automation at scale.
High-Impact Engineering – Solve real-world reliability and distributed systems challenges.
Global Engineering Team – Work with experienced engineers across multiple countries.
Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams.
International Exposure – Build systems used across Southeast Asia markets.
Learning & Development Budget – Support continuous technical growth and certifications.
High Ownership Environment – Strong autonomy over reliability and operational design.
Modern Engineering Culture – Focus on stability, observability and engineering excellence.
Competitive Compensation – Attractive salary package based on experience and impact.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringDevOpsCloud InfrastructureMonitoring ToolsCI/CD PipelinesDeployment AutomationCapacity PlanningInfrastructure-as-CodeObservability StacksReliability Frameworks

Soft Skills

TroubleshootingCollaborationStructured ApproachCalm Under PressureOwnership MindsetDisciplined Work EthicProblem-SolvingCommunicationAdaptabilityLeadership