
HPC Engineer
RCH Solutions
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AnsibleAWSCloudDNSGoogle Cloud PlatformLinuxNFSTerraform
About the role
- Work closely with customer stakeholders, scientists, and IT professionals to deliver Compute at Scale and support our customer's scientific initiatives
- Develop, evolve, and administer HPC platforms along with support for Scientific applications, workflows, and other related infrastructure both on-prem and Cloud hosted
- Drive architecture, roadmaps, and execution of projects to establish and operate IT infrastructure best practices for customers
- Provide full stack support - design and evolution of platforms, application administration, supporting customer workflows, profiling and performance tuning, monitoring and maintenance of scoped systems, platform and systems administration, troubleshooting hardware, software, and networking related issues, solution architecting and hands on engineering (on-prem + Cloud), as well as documentation
- Collaborate with cross-discipline team members and customers to deliver HPC and peripheral Compute at Scale services
- Thorough understanding of related industry best practices
- Support internal and customer Architecture and Design efforts
- Support customers with their workflow pipelines (advisory and hands-on)
- Comprehensively document new and existing computational assets
- Maintain the flexibility to pivot as engagement scopes may evolve
- Support for AWS & GCP Cloud applications, migrations, and modernization
- CloudOps / IaC for on-going platform management
- Setup and configuration of AWS & GCP Cloud infrastructure for new platform builds
- Ensure system compliance with company security standards and applicable regulatory requirements
- Transition support for modernized services to operational teams
- Provide engineering level troubleshooting and services restoration for operational issues as they arise on supported platforms
- Provide training/mentorship for junior level team members
Requirements
- A bachelor’s degree or master’s degree in Computer Science or related field
- 5 + years of experience administering HPC clusters and systems
- Experience with SLURM and Grid Engine scheduling software preferred
- 5 + years of professional experience in Solution Architecture or Cloud Infrastructure Deployment and support
- 5+ years professional experience developing or administering compute solutions for Scientific / Research IT domains, Life Sciences being preferred
- Experience with POSIT products (Package Manager, Connect, Workbench) either in an end-user or administrator capacity
- Experience developing scientific workflows on HPC systems using Nextflow
- Extensive command-line system administration experience
- User and group management
- Advanced knowledge of Active Directory, DNS, DHCP, LDAP, NFS, SMB
- Building applications from source code, installing, maintaining, and troubleshooting application-level Linux and scientific software in line with industry best practices
- Installation of Linux operating system and fine tuning
- Familiarity with leveraging and maintaining Linux package management systems
- Intermediate OS level networking knowledge
- Experience using with scripting tools, automation tools, and configuration management tools
- Ansible , Terraform and Cloud Formation experience preferred
- Experience administering and integrating Scientific / Research applications
- Strong time-management skills; able to complete projects in a timely manner, plan and prioritize tasks while keeping leadership and stakeholders updated regularly on status
- Excellent communication skills, including preparation of written documentation for IT colleagues and end users
- Proactive thinking skills to identify potential issues and solution options prior to incidents occurring
- Extreme attention to detail is needed to interface with multi different clients simultaneously
- Ability to understand and analyze complex technical problems and situations
Benefits
- A competitive salary and bonus package based on experience
- Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance
- Company-provided Life and Long-Term Disability Insurance
- Company-sponsored 401(k) Plan
- Company-provided continuing education benefit
- Team-focused culture and unlimited opportunity for advancement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
HPC administrationSolution ArchitectureCloud Infrastructure DeploymentSLURMGrid EngineNextflowLinux system administrationActive DirectoryAnsibleTerraform
Soft skills
time-managementcommunicationproactive thinkingattention to detailproblem analysisproject planningtask prioritizationmentorshipcollaborationdocumentation