WVU Online

Intermediate / Mid-Level System Administrator

WVU Online

full-time

Posted on:

Origin:  • 🇺🇸 United States • Virginia, West Virginia

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

LinuxPython

About the role

  • Administer and maintain Linux-based HPC clusters, including compute nodes, head nodes, and storage systems.
  • Monitor system health, performance, and resource utilization to ensure high availability and efficiency.
  • Manage job schedulers and resource managers (e.g., Slurm, PBS, or Torque).
  • Configure and maintain high-speed storage and parallel file systems (e.g., Lustre, BeeGFS, or GPFS).
  • Ensure cluster security and compliance, including user access management, software patching, and vulnerability monitoring.
  • Install, update, and optimize scientific software modules and libraries (e.g., via Spack, EasyBuild, or environment modules).
  • Develop automation scripts (Bash, Python) to streamline administrative tasks.
  • Perform backup and disaster recovery planning for HPC systems and research data.
  • Collaborate with researchers to troubleshoot complex computing workflows and improve job throughput.
  • Document HPC procedures, best practices, and system changes for team and user reference.
  • Support enrollment growth, student retention, and campus safety by maintaining reliable computational resources.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • A minimum of three (3) years of Linux system administration experience in a multi-user environment.
  • A minimum of three (3) years of experience in a combination of the following: Hands-on experience with HPC clusters, job schedulers, and parallel computing.
  • Familiarity with parallel file systems, storage management, and networked environments (Infiniband, Ethernet).
  • Experience with system monitoring and performance tuning in Linux environments.
  • Proficiency in scripting with Bash or Python for automation and system management.
  • Any equivalent combination of related education and/or experience will be considered.
  • All qualifications must be met by the time of employment.
  • Familiarity with parallel file systems, storage management, and networked environments (Infiniband, Ethernet).
  • Experience with system monitoring and performance tuning in Linux environments.
  • Strong troubleshooting and documentation skills, with the ability to collaborate effectively with researchers.
  • Experience with GPU-enabled nodes and CUDA or ROCm environments.
  • Familiarity with HPC software stacks, scientific libraries, and containerized workflows (e.g., Singularity/Apptainer).
  • Knowledge of data security requirements for research, such as HIPAA, FISMA, or CUI.
  • Prior experience supporting HPC environments in academia or research settings (preferred)
Rescale

HPC Engineer, R&D

Rescale
Mid · Seniorfull-time$100k–$150k / year🇺🇸 United States
Posted: 17 days agoSource: jobs.ashbyhq.com
AWSAzureCloudLinuxPythonTerraformUnix
Factorial

HPC/AI Workload Profiling Intern

Factorial
Entryfull-time🇪🇸 Spain
Posted: 1 day agoSource: openchip.factorialhr.com
LinuxPandasPythonPyTorch
Crusoe

Senior Software Engineer, Managed Orchestration, Kubernetes

Crusoe
Seniorfull-time$166k–$204k / yearCalifornia · 🇺🇸 United States
Posted: 16 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformKubernetesLinuxNode.jsPythonRustTerraform
Lambda

Principal Product Manager – Networking

Lambda
Leadfull-time$378k–$630k / yearCalifornia, Washington · 🇺🇸 United States
Posted: 3 days agoSource: jobs.ashbyhq.com
CloudDistributed SystemsKubernetes
NVIDIA

Principal Software Engineer – CSP Engagements

NVIDIA
Leadfull-time$272k–$426k / yearCalifornia · 🇺🇸 United States
Posted: 12 days agoSource: nvidia.wd5.myworkdayjobs.com
CloudLinux