Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
RCH Solutions

Principal Cloud Platform Engineer

RCH Solutions

Principal Cloud Platform Engineer with expertise in Kubernetes-based infrastructure for RCH Solutions. Join Cloud Engineering team to design and operate scalable AI Platforms in life sciences.

Posted 5/12/2026full-timeRemote • New York • 🇺🇸 United StatesLeadWebsite

Tech Stack

Tools & technologies
BigQueryCloudDistributed SystemsFluxGoogle Cloud PlatformGrafanaKubernetesNode.jsPrometheusTerraform

About the role

Key responsibilities & impact
  • Design, operate, and continuously improve production-grade K8s clusters at the platform level.
  • Lead complex cluster lifecycle management, including:
  • Version upgrades and dependency coordination
  • Failure recovery and incident resolution
  • Non-trivial maintenance and system evolution
  • Build and maintain highly reliable, scalable, multi-tenant infrastructure.
  • Build and maintain end-to-end observability for LLM-based systems using Grafana, LangFuse, and LangSmith — covering performance, latency, token usage, and alerting.
  • Architect and operate shared infrastructure across multiple teams and use cases.
  • Implement and enforce RBAC and access control models, Tenant isolation and security boundaries, Resource management and fairness at scale.
  • Ensure platform stability under diverse and competing workloads.
  • Operate and optimize vector database systems (Weaviate preferred) in production environments.
  • Support and scale Retrieval-Augmented Generation (RAG) systems.
  • Drive improvements in Query performance and latency, Cluster tuning and resource efficiency, Operational stability of retrieval pipelines.
  • Take technical ownership of production systems over time.
  • Build and maintain strong practices in Observability (metrics, logs, tracing), Incident response and root cause analysis, Long-term system health and resilience.
  • Proactively identify and resolve reliability risks.
  • Work closely with backend and GenAI engineers to ensure seamless integration with the platform.

Requirements

What you’ll need
  • 5+ years hands-on background in high-scale platform engineering (internal platforms, PaaS, or shared infra)
  • Deep Kubernetes Platform Expertise
  • Hands-on experience with GKE: Cluster upgrades, node pool management, autoscaling
  • Managing failures, disruptions, and complex maintenance scenarios
  • RBAC, namespaces, network policies
  • GCP IAM, Workload Identity, Secret Manager
  • GCP Storage: BigQuery, GCS, Firestore
  • Terraform and IaaC experience with GitOps workflows (ArgoCD, Flux or equivalent)
  • Strong observability practices using Google Cloud Operations Suite (Stackdriver), Prometheus / Grafana
  • Hands-on experience operating vector databases in production, ideally Weaviate: Query performance tuning, Cluster stability and scaling behavior
  • Solid understanding of distributed systems design and failure modes
  • Multi-zone / regional architectures
  • Google Cloud Load Balancing

Benefits

Comp & perks
  • A competitive salary and bonus package based on experience
  • Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance
  • Company-provided Life and Long-Term Disability Insurance
  • Company-sponsored 401(k) Plan
  • Company-provided continuing education benefit
  • Team-focused culture and unlimited opportunity for advancement

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesGKERBACTerraformIaaCGoogle Cloud Operations SuitePrometheusGrafanaWeaviatedistributed systems design
Soft Skills
leadershipincident resolutionproblem-solvingtechnical ownershipcollaboration