
LLM Ops Engineer
Litera
full-time
Posted on:
Location Type: Hybrid
Location: Denver • Colorado • North Carolina • United States
Visit company websiteExplore more
Salary
💰 $140,000 - $180,000 per year
About the role
- Fine-tune pre-trained models for specific use cases
- Curate and prepare datasets for training
- Manage training infrastructure, resources, and computational environments
- Implement optimization techniques to improve model performance
- Develop and manage APIs for model serving
- Scale infrastructure to handle varying demand loads
- Build and maintain the GenAI middleware/sidecar layer
- Integrate LLMs with existing systems and data sources
- Track performance metrics including latency and throughput
- Monitor quality metrics such as hallucination rates and accuracy
- Optimize costs associated with model inference and training
- Create and maintain dashboards for real-time performance insights
- Create and maintain golden datasets for benchmark testing
- Implement statistical validation methods for model outputs
- Set up similarity matching criteria for response evaluation
- Develop confidence score thresholds for production systems
- Design and implement user feedback collection systems
- Establish continuous improvement processes
- Create A/B testing frameworks for model and feature evaluation
- Conduct trace analysis to identify areas for performance optimization
- Implement content moderation systems
- Detect and mitigate bias in model outputs
- Ensure regulatory compliance in AI systems
- Develop output validation frameworks
- Version and store prompts systematically
- Create and maintain prompt templates
- Set up playground environments for prompt testing
- Abstract prompts from application code for better maintainability
Requirements
- Experience with LLM development, fine-tuning, and deployment
- Strong programming skills, particularly in Python
- Experience with Kubeflow, Apache Airflow, MLFlow, or other LLM Pipeline technology
- Experience with Azure OpenAI, AWS Sagemaker, and/or Vertex AI
- Understanding of machine learning operations and MLOps principles
- Knowledge of infrastructure scaling and optimization
- Experience with AI monitoring tools and dashboard creation
- Familiarity with AI safety, bias detection, and compliance requirements
- Strong problem-solving abilities and analytical thinking
- Familiarity with ISO 27001 and SOC2 Certification
Benefits
- Health insurance
- Retirement savings plans
- Generous paid time off
- Supportive work-life balance
- Comprehensive benefits package
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
fine-tuningdataset preparationmodel optimizationAPI developmentinfrastructure scalingA/B testingstatistical validationcontent moderationbias detectionprompt engineering
Soft Skills
problem-solvinganalytical thinkingcontinuous improvementcommunicationcollaboration
Certifications
ISO 27001SOC2 Certification