Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
BJAK

Site Reliability Engineer – Insurance Platform

BJAK

Site Reliability Engineer ensuring operational stability and reliability of BJAK’s insurance automation platform. Collaborating with engineering teams for system improvements and incident management.

Posted 6/27/2026full-timeRemote • 🇨🇳 ChinaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AnsibleAWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

About the role

Key responsibilities & impact
  • Own reliability and operational stability of BJAK’s production systems.
  • Design and improve monitoring, alerting, logging and observability across services.
  • Lead incident response, troubleshooting and structured root cause analysis.
  • Improve system resilience through redundancy, failover and recovery strategies.
  • Work with engineers to design systems that are reliable, scalable and operable in production.
  • Improve deployment safety through CI/CD pipelines, release strategies and automation.
  • Reduce recurring incidents by identifying root causes and driving long-term fixes.
  • Manage and optimize cloud infrastructure supporting business-critical workflows.
  • Strengthen operational practices including on-call processes, incident playbooks and SLAs.
  • Continuously improve system uptime, performance and operational maturity.

Requirements

What you’ll need
  • Experience in Site Reliability Engineering, DevOps, platform engineering or infrastructure roles.
  • Strong understanding of distributed systems, cloud infrastructure and production operations.
  • Experience with monitoring, alerting and observability tools.
  • Strong troubleshooting skills for production incidents and system failures.
  • Ability to design for reliability, scalability and fault tolerance.
  • Experience working with CI/CD pipelines and deployment automation.
  • Strong understanding of system performance, capacity planning and risk management.
  • Hands-on ownership mindset during incidents and operational issues.
  • Calm, structured and disciplined approach to production environments.
  • Strong collaboration with engineering teams in fast-paced environments.
  • Bonus Points
  • Experience with AWS, GCP, Azure or similar cloud platforms.
  • Experience with Kubernetes, Docker or container orchestration systems.
  • Experience with infrastructure-as-code tools (Terraform, Ansible, etc).
  • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc).
  • Experience with incident management tools and on-call systems.
  • Experience with zero-downtime deployments and progressive delivery strategies.
  • Experience working in fintech, insurance or regulated industries.
  • Experience building reliability frameworks or SRE best practices in scaling systems.
  • Contributions to platform reliability or infrastructure resilience initiatives.

Benefits

Comp & perks
  • Build Reliable Insurance Systems – Support mission-critical automation at scale.
  • High-Impact Engineering – Solve real-world reliability and distributed systems challenges.
  • Global Engineering Team – Work with experienced engineers across multiple countries.
  • Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams.
  • International Exposure – Build systems used across Southeast Asia markets.
  • Learning & Development Budget – Support continuous technical growth and certifications.
  • High Ownership Environment – Strong autonomy over reliability and operational design.
  • Modern Engineering Culture – Focus on stability, observability and engineering excellence.
  • Competitive Compensation – Attractive salary package based on experience and impact.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsCloud InfrastructureMonitoring ToolsCI/CD PipelinesDeployment AutomationCapacity PlanningInfrastructure-as-CodeObservability StacksReliability Frameworks
Soft Skills
TroubleshootingCollaborationStructured ApproachCalm Under PressureOwnership MindsetDisciplined Work EthicProblem-SolvingCommunicationAdaptabilityLeadership