Salary
💰 $140,000 - $215,000 per year
Tech Stack
AnsibleAWSChefCloudCyber SecurityDistributed SystemsGoKubernetesLinuxPython
About the role
- Lead a group of software development engineers responsible for tools to deploy and manage private cloud-based infrastructure
- Build, monitor and maintain complex multi-cloud (on premise & public) distributed systems infrastructure focused on third party object store
- Spearhead design, development and implementation of a custom S3 compatible object store across 2,000+ Linux servers
- Collaborate with cross-functional teams to ensure successful integration of solutions and mentor junior engineers
- Perform Linux engineering and administration for thousands of bare metal and virtual machines
- Engineer large-scale cloud environments for clustered object storage solutions
- Troubleshoot server hardware issues while monitoring, maintaining, and operating a production environment
- Automate routine tasks and deploy using IaaS model with tools like Chef or Ansible
- Share on-call rotation with other team members
- Write scripts and programs for automation, tools, frameworks, dashboards, and alarms
- Work with internal peers and business partners to analyze requirements and craft robust solutions
- Engage with other engineers to disseminate best practices and raise technical IQ of the team
- Spearhead technical teams, manage project timelines, track status of project activities and ensure schedules and priorities are met
- Ensure critical issues are identified, tracked through resolution, and escalated if necessary
- Collaborate with leadership to develop tools for analyzing and forecasting growth within the operating environment
- Represent the development team as a technical leader across workgroups
Requirements
- Must be a United States citizen or permanent resident (clearance is not required)
- 7+ years of professional experience working on large scale, distributed systems
- BS/MS degree in Computer Science or related field (or equivalent work experience)
- Extensive expertise with on-premise storage (block and object) technologies, including hands-on experience with Kubernetes-native, AWS S3-compatible object storage solutions
- Automation experience using Python/Go scripting and Chef config management
- Proficiency with public cloud and cloud administration concepts
- Linux engineering and administration experience for large scale environments
- Experience troubleshooting server hardware issues in production
- Experience with IaaS deployment automation using tools like Chef or Ansible
- Proficiency in the use of project and program tooling (e.g.: Jira, Gitlab)
- Excellent written and verbal communication skills
- Comfort collaborating with professionally distributed, cross-functional teams
- Proactive attitude and ability to work independently and in teams
- A passion for documentation and knowledge transfer
- Deep technical fluency of systems and organizations
- Willingness to travel 2-3 business related trips per year
- Must periodically undergo and pass additional background and fingerprint check(s) consistent with government customer requirements