Staff Engineer – Compute Infrastructure and Grid Operations

Marvell Technology

Senior Engineer at Marvell designing and enhancing engineering compute infrastructure for chip design and verification. Focused on grid job management, storage systems, and reliability in compute environments.

Posted 4/29/2026full-timeWestborough • California, Massachusetts, Texas • 🇺🇸 United StatesLead💰 $128,000 - $189,370 per yearWebsite

Tech Stack

Tools & technologies

Distributed SystemsLinuxNFSPython

About the role

Key responsibilities & impact

Design, operate, and continuously improve the engineering compute infrastructure for large-scale chip design and verification
Own and evolve grid job management infrastructure used for large regressions and high-volume batch workloads
Debug and resolve grid job failures, including scheduling issues, hung jobs, resource starvation, and intermittent infrastructure faults
Improve job reliability through watchdogs, retries, heartbeats, timeouts, and failure detection mechanisms
Develop deep operational understanding of shared engineering storage systems used by compute jobs
Collaborate with IT teams on filesystem migrations, maintenance windows, and outage prevention
Proactively identify systemic issues that lead to grid instability or job loss
Act as a technical bridge between engineering users, tools teams, and central IT

Requirements

What you’ll need

Bachelor’s degree in computer science, Computer Engineering, Electrical Engineering, or equivalent experience
8+ years of experience in compute infrastructure, grid operations, or large-scale engineering environments
Strong experience with grid or batch schedulers (e.g., LSF, UGE, Slurm, PBS)
Hands-on experience debugging distributed systems and batch job failures
Strong Linux systems knowledge, including process management and resource monitoring
Experience with shared storage systems (NFS, enterprise filers, high-performance filesystems)
Strong scripting skills in Python, shell, or similar languages

Benefits

Comp & perks

Employee stock purchase plan with a 2-year look back
Family support programs to help balance work and home life
Robust mental health resources to prioritize emotional well-being
Recognition and service awards to celebrate contributions and milestones

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

grid job managementdebugging distributed systemsbatch job failuresLinux systemsprocess managementresource monitoringscripting in Pythonscripting in shellshared storage systemshigh-performance filesystems

Soft Skills

collaborationproblem-solvingtechnical communicationproactive identification of issuesoperational understanding

Certifications

Bachelor’s degree in computer scienceBachelor’s degree in Computer EngineeringBachelor’s degree in Electrical Engineering