Tech Stack
AirflowApacheCloudCyber SecurityDistributed SystemsGoGRPCJavaJenkinsKubernetesPythonRayScalaSparkTerraform
About the role
- Build out ML Platform & GenAI Studio from the ground up to support CrowdStrike's cybersecurity mission
- Collaborate closely with Data Platform Software Engineers, Data Scientists & Threat Analysts to design, implement, and maintain scalable ML pipelines
- Design and implement pipelines for Data Preparation, Cataloguing, Feature Engineering, Model Training, and Model Serving
- Bridge the gap between model development and operational success in a production-focused culture
- Design, build and facilitate adoption of a modern ML platform including support for use cases like GenAI
- Understand current ML workflows, anticipate future needs and templatize repeatable components for model development, deployment, and monitoring
- Build a platform that scales to thousands of users and offers self-service capability for experimentation, training and inference pipelines
- Leverage workflow orchestration tools to deploy efficient and scalable execution of complex data and ML pipelines
- Champion software development best practices around building distributed systems
- Leverage cloud services like Kubernetes, blob storage, and queues in a cloud-first environment
- Contribute to future generative AI investments such as modelling attack paths for IT assets
Requirements
- B.S. /MS in Computer Science or a related field and 10+ years related experience; or M.S. with 8+ years of experience
- 3+ years experience developing and deploying machine learning solutions to production
- Familiarity with typical machine learning workflows from an engineering perspective (how they are built and used, not necessarily the theory); familiarity with supervised / unsupervised approaches: how, why, and when and labelled data is created and used
- 3+ years experience with ML Platform tools like Jupyter Notebooks, NVidia Workbench, MLFlow, Ray etc.
- Experience building data platform product(s) or features with (one of) Apache Spark, Flink or comparable tools
- Proficiency in distributed computing and orchestration technologies (Kubernetes, Airflow, etc.)
- Production experience with infrastructure-as-code tools such as Terraform, FluxCD
- Expert level experience with Python; Java/Scala exposure is recommended
- Ability to write Python interfaces to provide standardized and simplified interfaces for data scientists to utilize internal Crowdstrike tools
- Expert level experience with containerization frameworks
- Strong analytical and problem solving skills, capable of working in a dynamic environment
- Exceptional interpersonal and communication skills. Work with stakeholders across multiple teams and synthesize their needs into software interfaces and processes
- Critical Skills Needed for Role: Distributed Systems Knowledge; Data/ML Platform Experience
- Desirable: Go; Iceberg (highly desirable); Pinot or other time-series/OLAP-style database; Jenkins; Parquet; Protocol Buffers/GRPC