Administer and maintain Linux-based HPC clusters, including compute nodes, head nodes, and storage systems
Monitor system health, performance, and resource utilization to ensure high availability and efficiency
Manage job schedulers and resource managers (e.g., Slurm, PBS, or Torque)
Configure and maintain high-speed storage and parallel file systems (e.g., Lustre, BeeGFS, or GPFS)
Ensure cluster security and compliance, including user access management, software patching, and vulnerability monitoring
Install, update, and optimize scientific software modules and libraries (e.g., via Spack, EasyBuild, or environment modules)
Develop automation scripts (Bash, Python) to streamline administrative tasks
Perform backup and disaster recovery planning for HPC systems and research data
Collaborate with researchers to troubleshoot complex computing workflows and improve job throughput
Document HPC procedures, best practices, and system changes for team and user reference
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field
A minimum of three (3) years of Linux system administration experience in a multi-user environment
A minimum of three (3) years of experience in a combination of the following: Hands-on experience with HPC clusters, job schedulers, and parallel computing
Familiarity with parallel file systems, storage management, and networked environments (Infiniband, Ethernet)
Experience with system monitoring and performance tuning in Linux environments
Proficiency in scripting with Bash or Python for automation and system management
Experience with GPU-enabled nodes and CUDA or ROCm environments
Familiarity with HPC software stacks, scientific libraries, and containerized workflows (e.g., Singularity/Apptainer)
Knowledge of data security requirements for research, such as HIPAA, FISMA, or CUI
Strong troubleshooting and documentation skills, with the ability to collaborate effectively with researchers
Any equivalent combination of related education and/or experience will be considered
All qualifications must be met by the time of employment