Salary
💰 $133,650 - $220,680 per year
Tech Stack
AnsibleAWSAzureCloudCyber SecurityGoogle Cloud PlatformJenkinsKubernetesLinuxOpenShiftOpen SourcePythonTerraform
About the role
- Collaborate with research and product development teams to scale machine learning products for internal and external applications
- Create and manage model training and deployment pipelines
- Actively contribute to managing and releasing upstream and midstream product builds
- Test to ensure correctness, responsiveness, and efficiency
- Troubleshoot, debug and upgrade Dev & Test pipelines
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Collaborate with a cross-functional team about market requirements and best practices
- Keep abreast of the latest technologies and standards in the field
Requirements
- 2+ years of experience in MLOps, DevOps, Automation and modern Software Deployment practices
- Experience evaluating LLMs for performance on accelerators and accuracy (think HellaSwag, MMLU, Chatbot Arena, TruthfulQA, etc.)
- Being super comfortable with Python and PyTest is a must
- Strong experience with Git, Github Actions including self-hosted runners, Terraform, Jenkins, Ansible, and common technologies for automation and monitoring
- Highly experienced with administering Kubernetes/Openshift
- Familiar with Agile development methodology
- Experience with Cloud Computing using at least one of the following Cloud infrastructures: AWS, GCP, Azure, or IBM Cloud
- Solid programming skills especially in Python
- Solid troubleshooting skills
- Ability to interact comfortably with the other members of a large, geographically dispersed team
- Experience maintaining an infrastructure and ensuring stability
- Familiarity with contributing to the vLLM CI community is a big plus
- While a Bachelor’s degree or higher in computer science, mathematics, or a related discipline is valued, we prioritize technical prowess, initiative, problem solving, and practical experience