Tech Stack
AWSAzureDockerGoogle Cloud PlatformKubernetesPrometheusPythonTerraform
About the role
- Select, fine-tune, or deploy LLMs for product-specific needs
- Design and optimize RAG architectures with observability and vector search integration
- Implement and monitor prompt versioning using version control and CI/CD pipelines
- Track and analyze key metrics: hallucination rate, grounding score, latency, token cost
- Detect and mitigate bias in prompts (cultural, gender, linguistic)
- Support red teaming activities to assess model robustness and safety
- Collaborate with MLOps and platform teams to deploy models in production environments
- Promote internal standards for LLMOps, prompt governance, and model observability
- Ensure model reliability, safety, and auditability through prompt monitoring and secure deployment workflows
Requirements
- 2–4 years of hands-on experience with LLMs and Retrieval-Augmented Generation (RAG)
- 5–8 years in machine learning or software engineering roles
- Proven experience building RAG pipelines with LangChain or equivalent frameworks
- Familiarity with vector databases such as Pinecone, Weaviate, or Milvus
- Strong understanding of prompt lifecycle management: versioning, evaluation, rollback
- Experience implementing metrics and monitoring pipelines: hallucination rate, grounding, token usage
- Demonstrated ability to detect and reduce prompt-related bias (cultural, gender, linguistic)
- Comfortable working with LLMs in production environments using CI/CD and containerization
- Exposure to red teaming techniques for foundational model testing
- Recommended stack: GPT‑4/5/Claude/Gemini, LangChain/LlamaIndex/Hugging Face, Python
- DevOps familiarity: GitHub Actions, GitLab CI, Docker, Kubernetes (basic)
- Monitoring & evaluation tools: OpenAI Evals, TruLens, Prometheus, OpenTelemetry