Brillio

LLM Ops Engineer

Brillio

full-time

Posted on:

Origin:  • 🇺🇸 United States • Florida

Visit company website
AI Apply
Apply

Salary

💰 $60 - $65 per hour

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDistributed SystemsDockerGoogle Cloud PlatformKubernetesMicroservicesPythonPyTorchSQLTensorflow

About the role

  • Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
  • Build and optimize scalable infrastructure for large language model operations
  • Deploy LLMs to production environments with prompt management, observability, serverless deployment, monitoring, scaling, and performance optimization
  • Design, develop, and maintain RESTful API endpoints for LLM inference and model interactions
  • Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
  • Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
  • Research and evaluate emerging LLMOps techniques, tools, and methodologies and provide recommendations on technology and architecture
  • Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
  • Develop prototypes and POCs to validate new approaches and technologies
  • Collaborate closely with data scientists, ML engineers, DevOps teams, and product managers
  • Create comprehensive documentation for systems, processes, and architectural decisions
  • Mentor team members and share expertise through technical presentations and training sessions
  • Optimize data preprocessing and feature engineering pipelines for LLM training and inference
  • Implement data validation, quality checks, and lineage tracking for model training datasets
  • Design efficient data storage and retrieval systems for large-scale model artifacts and training data
  • Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
  • Ensure secure model deployment practices, access controls, and data privacy measures
  • Identify and mitigate risks associated with LLM deployment and operations
  • Maintain development, staging, and production environments for LLM workflows

Requirements

  • Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (B.E/B.Tech/M.Tech) or Equivalent
  • LLMOps Engineer with software engineering experience
  • 6-12 years of experience building production-quality software (minimum 6 years)
  • At least 5 years of experience in Python
  • 6+ years of software development experience with strong programming skills in Python and SQL
  • 2+ years of hands-on experience in LLMOps
  • 1+ years of experience with machine learning operations, model deployment, and lifecycle management
  • Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
  • Experience with Docker, Kubernetes, and container orchestration for ML workloads
  • Strong experience in designing, building, and maintaining production-grade APIs for ML services
  • Proficiency with Git, CI/CD pipelines, and DevOps practices
  • Understanding of LLM architectures, training methodologies, and fine-tuning techniques
  • Knowledge of ML pipeline design, model monitoring, and deployment strategies
  • Understanding of distributed systems, scalability patterns, and microservices architecture
  • "Good-to-have": Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
  • "Good-to-have": Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
  • "Good-to-have": Experience with vector search
  • Note: Exceptional candidates without advanced degrees will be considered