Software Engineer – DevOps AI Rater/Evaluator

LILT AI

contract

Posted on: 2/26/2026

Location Type: Remote

Location: Germany

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Cloud Distributed Systems

About the role

Evaluate AI outputs related to software engineering, DevOps, and infrastructure topics
Perform structured scoring, comparison, classification, and judgment tasks
Assess technical correctness, completeness, security implications, and best-practice alignment
Identify hallucinations, incorrect code, unsafe recommendations, or misleading system guidance
Apply domain-specific engineering and DevOps guidelines consistently across tasks
Validate and refine evaluation rubrics and edge-case handling
Perform adjudication where raters disagree
Conduct error analysis and qualitative reviews of model behavior
Partner with LILT research, product, and customer teams on evaluation design
Support red-teaming, security review, and model readiness assessments

Requirements

Software engineers, site reliability engineers, DevOps engineers, or platform engineers
Experience with production systems, CI/CD pipelines, cloud infrastructure, or distributed systems
Strong attention to detail and comfort working with structured evaluation criteria
Native or professional fluency in one or more supported languages is required
English fluency is required for guidelines, feedback, and collaboration.

Benefits

Contract-based, flexible participation
Project-based work with clear expectations and timelines
Opportunities for recurring work based on performance and demand
Compensation communicated upfront per project or task type

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AI evaluationsoftware engineeringDevOpsinfrastructureerror analysisCI/CD pipelinescloud infrastructuredistributed systemstechnical correctnessbest-practice alignment

Soft Skills

attention to detailstructured evaluationcollaborationjudgmentqualitative reviews