Tech Stack
AirflowAWSAzureCloudDockerGoogle Cloud PlatformHadoopJenkinsKubernetesNumpyOpen SourcePandasPythonPyTorchScikit-LearnSpark
About the role
- Design, train, and validate supervised and unsupervised models (e.g., anomaly detection, classification, forecasting).
- Architect and implement deep learning solutions (CNNs, Transformers) with PyTorch.
- Develop and fine-tune Large Language Models (LLMs) and build LLM-driven applications.
- Implement Retrieval-Augmented Generation (RAG) pipelines and integrate with vector databases.
- Build robust pipelines to deploy models at scale (Docker, Kubernetes, CI/CD).
- Ingest, clean and transform large datasets using libraries like pandas, NumPy, and Spark.
- Automate training and serving workflows with Airflow or similar orchestration tools.
- Monitor model performance in production; iterate on drift detection and retraining strategies.
- Implement LLMOps practices for automated testing, evaluation, and monitoring of LLMs.
- Write production-grade Python code following SOLID principles, unit tests and code reviews.
- Collaborate in Agile (Scrum) ceremonies; track work in JIRA.
- Document architecture and workflows using PlantUML or comparable tools.
- Communicate analysis, design and results clearly in English.
- Partner with DevOps, data engineering and product teams to align on requirements and SLAs.
- Design and prototype novel ML/DL models and productionize them end-to-end, integrating solutions into data pipelines and services.
Requirements
- Bachelor’s or Master’s in Computer Science, Data Science or related field.
- 5+ years of professional experience with Python in production environments.
- Solid background in machine learning & deep learning (CNNs, Transformers, LLMs).
- Hands-on experience with PyTorch or similar frameworks (training, custom modules, optimization).
- Proven track record deploying ML solutions.
- Expert in pandas, NumPy and scikit-learn.
- Familiarity with Agile/Scrum practices and tooling (JIRA, Confluence).
- Strong foundation in statistics and experimental design.
- Excellent written and spoken English.
- Preferred: Experience with cloud platforms (AWS, GCP, or Azure) and their AI-specific services like Amazon SageMaker, Google Vertex AI, or Azure Machine Learning.
- Preferred: Familiarity with big-data ecosystems (Spark, Hadoop).
- Preferred: Practice in CI/CD & container orchestration (Jenkins/GitLab CI, Docker, Kubernetes).
- Preferred: Exposure to MLOps/LLMOps tools (MLflow, Kubeflow, TFX).
- Preferred: Experience with Large Language Models, Generative AI, prompt engineering, and RAG pipelines.
- Preferred: Hands-on experience with vector databases (e.g., Pinecone, FAISS).
- Preferred: Experience building AI Agents and using frameworks like Hugging Face Transformers, LangChain or LangGraph.
- Preferred: Documentation skills using PlantUML or similar.