Lead the architecture, development, and deployment of scalable machine learning and AI systems centered on real-time LLM inference for concurrent users.
Design, implement, and manage agentic AI frameworks leveraging Google Adk, Langgraph or custom-built agents.
Integrate foundation models (GPT, LLaMA, Claude, Gemini) and fine-tune them for domain-specific intelligent applications.
Build robust MLOps pipelines for end-to-end lifecycle management of models—training, testing, deployment, and monitoring.
Collaborate with DevOps teams to deploy scalable serving infrastructures using containerization (Docker), orchestration (Kubernetes), and cloud platforms.
Drive innovation by adopting new AI capabilities and tools, such as Google Gemini, to enhance AI model performance and interaction quality.
Partner cross-functionally to understand traffic patterns and design AI systems that handle real-world scale and complexity.
Requirements
Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related fields.
5+ years in ML engineering, applied AI, or data scientist roles.
Strong programming expertise in Python and frameworks including PyTorch, TensorFlow, Hugging Face Transformers.
Deep experience with NLP, Transformer models, and generative AI techniques.
Hands-on experience deploying AI models to concurrent users with high throughput and low latency.
Skilled in cloud environments (AWS, GCP, Azure) and container orchestration (Docker, Kubernetes).
Familiarity with vector databases (FAISS, Pinecone, Weaviate) and retrieval-augmented generation (RAG).
Experience with agentic AI using Adk, LangChain, Langgraph and Agent Engine
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
machine learningAI systemsreal-time inferenceMLOpsPythonPyTorchTensorFlowNLPTransformer modelsgenerative AI