
DevOps – Infrastructure Engineer
BTSE
full-time
Posted on:
Location Type: Remote
Location: Hong Kong
Visit company websiteExplore more
Tech Stack
About the role
- Set up a multi-tenant Kubernetes cluster: shared services namespace, per-tenant namespaces for isolated workloads, GPU node pools for model inference.
- Build CI/CD pipeline: source control → container build → automated deployment with zero-downtime rolling updates.
- Configure GPU management: scheduling, resource quotas per tenant, device plugins.
- Set up comprehensive monitoring: per-tenant metrics, SLA tracking, data pipeline health, GPU utilisation, API latency percentiles, WebSocket connection stability.
- Implement backup and disaster recovery: cross-region replication, automated database backups.
- Build tenant provisioning automation: scripted creation of new tenant namespaces, storage, network policies, and service accounts.
- Security hardening: network policies between namespaces, vulnerability scanning, audit logging.
- 24/7 on-call during initial pilot (rotating with Tech Lead).
Requirements
- 4+ years DevOps/SRE; Kubernetes cluster operations including multi-tenant patterns.
- GPU workloads on Kubernetes (GPU Operator, device plugins, resource scheduling).
- CI/CD pipelines: GitHub Actions, ArgoCD or FluxCD.
- Terraform IaC.
- On-call experience and incident management.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesCI/CDGPU managementTerraformincident managementbackup and disaster recoverynetwork policiesvulnerability scanningAPI latencydata pipeline
Soft Skills
on-call experiencecollaborationproblem-solvingcommunicationorganizational skills