CARBON3

Lead Systems Engineer – HPC, AI

CARBON3

full-time

Posted on:

Location Type: Hybrid

Location: 🇬🇧 United Kingdom

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AnsibleCloudDistributed SystemsGrafanaKubernetesLinuxSplunk

About the role

  • Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
  • Monitor system health, alerts, and customer usage patterns
  • Document solutions/workarounds, create and maintain knowledge, document support procedures
  • Automate common tasks and fixes
  • Configure and integrate tooling to support optimal operation of the platform, and support tool selection
  • Assist customers with platform configuration, onboarding, and usage best practices
  • Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
  • Ensure SLAs and customer satisfaction targets are met
  • L1 support for customer-reported issues and requests
  • L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure
  • Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing

Requirements

  • Extensive experience in technical support, system engineering, or platform operations
  • Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting)
  • Familiarity with cloud-based platforms, APIs, and distributed systems
  • Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics)
  • Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk)
  • Excellent communication skills to interface with both customers and internal / vendor teams
  • Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience
  • System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel
  • Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration
  • Understanding of automation, monitoring and security with GPU as a service.
Benefits
  • Health insurance
  • Professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
technical supportsystem engineeringplatform operationsL1 supportL2 supporttroubleshootingautomationsystem administrationAnsibleKubernetes
Soft skills
communicationcollaborationcustomer satisfactionproblem-solvingdocumentation