LLM Ops Engineer

Brillio

full-time

Posted on: 9/8/2025

Location: Florida • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $60 - $65 per hour

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDistributed SystemsDockerGoogle Cloud PlatformKubernetesMicroservicesPythonPyTorchSQLTensorflow

About the role

Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
Build and optimize scalable infrastructure for large language model operations
Deploy LLMs to production environments with prompt management, observability, serverless deployment, monitoring, scaling, and performance optimization
Design, develop, and maintain RESTful API endpoints for LLM inference and model interactions
Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
Research and evaluate emerging LLMOps techniques, tools, and methodologies and provide recommendations on technology and architecture
Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
Develop prototypes and POCs to validate new approaches and technologies
Collaborate closely with data scientists, ML engineers, DevOps teams, and product managers
Create comprehensive documentation for systems, processes, and architectural decisions
Mentor team members and share expertise through technical presentations and training sessions
Optimize data preprocessing and feature engineering pipelines for LLM training and inference
Implement data validation, quality checks, and lineage tracking for model training datasets
Design efficient data storage and retrieval systems for large-scale model artifacts and training data
Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
Ensure secure model deployment practices, access controls, and data privacy measures
Identify and mitigate risks associated with LLM deployment and operations
Maintain development, staging, and production environments for LLM workflows

Requirements

Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (B.E/B.Tech/M.Tech) or Equivalent
LLMOps Engineer with software engineering experience
6-12 years of experience building production-quality software (minimum 6 years)
At least 5 years of experience in Python
6+ years of software development experience with strong programming skills in Python and SQL
2+ years of hands-on experience in LLMOps
1+ years of experience with machine learning operations, model deployment, and lifecycle management
Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
Experience with Docker, Kubernetes, and container orchestration for ML workloads
Strong experience in designing, building, and maintaining production-grade APIs for ML services
Proficiency with Git, CI/CD pipelines, and DevOps practices
Understanding of LLM architectures, training methodologies, and fine-tuning techniques
Knowledge of ML pipeline design, model monitoring, and deployment strategies
Understanding of distributed systems, scalability patterns, and microservices architecture
"Good-to-have": Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
"Good-to-have": Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
"Good-to-have": Experience with vector search
Note: Exceptional candidates without advanced degrees will be considered

LLM Ops Engineer

Salary

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Renewal Operations Analyst

OPS F & W BS III

Senior Campaign Operations Strategist

Operations Coordinator I

Operations Specialist