Tech Stack
AzureCloudPySparkPythonPyTorchScikit-LearnTensorflow
About the role
- Design, develop, and deploy data science and ML solutions on Databricks (Azure environment).
- Work on end-to-end ML lifecycle, from data preparation and feature engineering to model training, evaluation, and deployment.
- Apply LLM fine-tuning and optimization techniques within Databricks for domain-specific use cases.
- Utilize PySpark for distributed data processing, cleaning, and transformation.
- Collaborate with data engineers, cloud architects, and business stakeholders to ensure seamless integration of ML models into production workflows.
- Conduct exploratory data analysis (EDA), statistical modeling, and hypothesis testing to extract insights from structured and unstructured data.
- Stay updated on the latest advancements in AI/ML, LLMs, and Databricks capabilities to bring innovative solutions.
- Document methodologies, experiments, and best practices for knowledge sharing.
Requirements
- Bachelor’s/Master’s degree in Computer Science, Data Science, Statistics, AI/ML, or related field.
- Proven experience as a Data Scientist with exposure to ML and NLP projects.
- Strong hands-on experience with Databricks on Azure (MLflow, Delta Lake, Databricks ML).
- Proficiency in PySpark for large-scale data processing.
- Experience in training, fine-tuning, and deploying LLMs within Databricks environment.
- Strong programming skills in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit-learn, Hugging Face).
- Solid understanding of data science workflows: data wrangling, feature engineering, model development, and evaluation.
- Working knowledge of Azure cloud services (Azure Data Lake, Azure Synapse, Azure ML).
- Strong problem-solving, analytical thinking, and communication skills.
- Good-to-have: Experience with MLOps practices and tools (CI/CD for ML, MLflow).
- Good-to-have: Knowledge of vector databases and LLM deployment pipelines.
- Good-to-have: Familiarity with prompt engineering and RAG techniques.
- Good-to-have: Exposure to generative AI projects on cloud platforms.