Salary
💰 $200,000 - $322,000 per year
Tech Stack
CloudDockerKubernetesOpenStack
About the role
- Develop automation running reliable AI infrastructure services at scale; both close to the bare metal and over VMaaS.
- Develop one or more teams to ensure that our internal and external facing cloud services atop of our hardware for accelerated computing are running as reliably as needed.
- Recruit and retain talent managing career development for your organization.
- Accountable for deliverables of team(s) in scope.
- Be accountable for cross team and cross company communications.
- Participate in KPI-driven strategic planning.
- Foster a collaborative environment.
Requirements
- 7+ overall years of experience
- BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
- 3+ years of management experience with prior hands-on experience as an individual contributor.
- A proven track record of impactful project deliveries while managing Software Engineers focused on cloud infrastructure or cloud application services.
- Experience with DevOps and/or SRE practices and/or Platform Engineering.
- Systematic problem-solving approach, coupled with strong communications skills and a sense of ownership and drive.
- Developing ML/AI infrastructure (way to stand out)
- Developing bare metal as a service (BMaaS) associated systems (way to stand out)
- Developing multi-cloud infrastructure services (way to stand out)
- Teaching reliability (e.g. SRE) or more general cloud systems good practices to peers or to other companies (e.g. CRE) (way to stand out)
- Running private or public cloud systems based on one or more of Kubernetes, OpenStack, NVIDIA BCM, Docker or Slurm (way to stand out)
- No prior experience having worked in a team of any particular name or having worked in a ML/AI focused team are required but also a nice to have.