
ML/AI Ops Engineer
Xcel Energy
full-time
Posted on:
Location Type: Office
Location: Denver • Colorado • Minnesota • United States
Visit company websiteExplore more
Salary
💰 $112,200 - $159,400 per year
Tech Stack
About the role
- Lead and support solution lifecycle technical activities
- Ensure solutions are designed for great user experience and operational performance
- Lead design, ensuring Enterprise Architecture, Security, Operations and Compliance aspects are continuously integrated into solutions
- Provide input to cost and schedule estimation
- Responsible for overall integrity of system design and operation
- Oversee vendor activities
- Conduct peer reviews and approve system changes and technical solution design
- Coach and mentor less experienced team members
- Partner cross-organizationally to drive minimal costs on optimal solutions
- Provide in-depth technical information to stakeholders as needed
- Innovate through usage of industry emerging capabilities and evolving customer needs
- Provide input to strategic roadmap and technical dependencies
- Continuously stay current on, and apply, technical industry knowledge pertaining to the respective domain
- Review solution performance and continually assess health of systems
- Track and drive awareness to operational and technical debt risks
- Provide escalated support to incident and problem management
- Utilize analytics to improve availability, reliability, efficiency and capacity
- Productionize machine learning and AI models, including classical ML and GenAI, using standardized MLOps pipelines
- Manage end-to-end model lifecycle activities: versioning, promotion, rollback, retraining, and retirement
- Implement CI/CD practices for models, features, and inference services
- Design, build, and maintain reusable MLOps pipelines for training, validation, deployment, and monitoring
- Develop common components (feature pipelines, quality checks, evaluation harnesses) to reduce friction across AI projects
- Implement monitoring for model performance, data drift, bias, and system health
- Own AI/ML operational SLAs, SLOs, and incident response, including root-cause analysis and post-mortems
- Ensure high availability, resilience, and recoverability of AI services
- Support regulated or high-risk AI use cases by embedding governance, validation, and documentation into MLOps workflows
- Produce and maintain required artifacts such as model cards, system cards, validation evidence, and audit support materials
- Partner closely with AI Governance and Risk teams to ensure alignment with enterprise standards
Requirements
- Ten years of related functional experience
- Bachelor's degree in Technology, Science, Business or related field, or 4 years of experience equivalent to the position
- Excellent communication skills
- Excellent Relationship Management and collaboration skills
- Expertise managing the lifecycle of technical solutions
- Deep Subject Matter Expertise within the respective system domain products, platforms, processes and architecture
- Broad and deep knowledge of technology architecture, infrastructure, network, security and software principles and models
- Experience working in partnership with internal and external vendors
- Excellent analytical, problem-solving and troubleshooting skills
- Extensive knowledge of future technology trends within area of expertise
- Demonstrated leadership on technical aspects of large-scale projects
- Experience coaching other developers in system deployment or operational troubleshooting
- Experience with delivery methodologies (Waterfall, Agile, Scrum) and operational models (ITIL)
- Experience and understanding of core IT Service Management functions, such as Change Management and Incident Management
Benefits
- Annual Incentive Program
- Medical/Pharmacy Plan
- Dental
- Vision
- Life Insurance
- Dependent Care Reimbursement Account
- Health Care Reimbursement Account
- Health Savings Account (HSA) (if enrolled in eligible health plan)
- Limited-Purpose FSA (if enrolled in eligible health plan and HSA)
- Transportation Reimbursement Account
- Short-term disability (STD)
- Long-term disability (LTD)
- Employee Assistance Program (EAP)
- Fitness Center Reimbursement (if enrolled in eligible health plan)
- Tuition reimbursement
- Transit programs
- Employee recognition program
- Pension
- 401(k) plan
- Paid time off (PTO)
- Holidays
- Volunteer Paid Time Off (VPTO)
- Parental Leave
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
MLOpsmachine learningAI modelsCI/CD practicessystem designanalyticsoperational performancetechnical solution designcost estimationincident management
Soft Skills
communication skillsrelationship managementcollaboration skillsanalytical skillsproblem-solving skillstroubleshooting skillsleadershipcoachingmentoringcross-organizational partnership