LEO Technologies, LLC

Software Engineer, AI/LLM Evaluation and Alignment

LEO Technologies, LLC

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $135,000 - $160,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AirflowAWSAzureCloudElasticSearchKafkaKubernetesPythonPyTorchTerraform

About the role

  • Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases.
  • Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows.
  • Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability).
  • Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems.
  • Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios.
  • Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment.
  • Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs.
  • Provide technical leadership in responsible AI practices, influencing standards across the organization.
  • Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus).
  • Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation.

Requirements

  • Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field.
  • 3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety.
  • Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming.
  • Experience with bias detection, fairness approaches, and responsible AI design.
  • Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith
  • Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex).
  • Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions.
  • Understanding of cloud AI platforms (AWS, Azure) and deployment best practices.
  • Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios.
  • Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders.
LEO Technologies, LLC

AI/NLP Engineer

LEO Technologies, LLC
Mid · Seniorfull-time$130k–$150k / yearCalifornia · 🇺🇸 United States
Posted: 11 days agoSource: jobs.lever.co
AirflowApacheAWSAzureCloudElasticSearchElixirGrafanaKafkaPythonPyTorchReact+2 more
Twilio

Machine Learning Engineer

Twilio
Junior · Midfull-time🇨🇦 Canada
Posted: 9 days agoSource: boards.greenhouse.io
AirflowAWSAzureCloudDockerGoogle Cloud PlatformHadoopJavaKubernetesPythonPyTorchScikit-Learn+2 more
HR Ways - Hiring Tech Talent

AI Architect

HR Ways - Hiring Tech Talent
Mid · Seniorfull-time🇵🇰 Pakistan
Posted: 10 days agoSource: www.careers-page.com
AirflowApacheAWSAzureCloudDockerETLGoogle Cloud PlatformGRPCJavaKafkaKubernetes+7 more
Kontakt.io

Senior Data Scientist, Healthcare

Kontakt.io
Seniorfull-time$150k–$200k / yearNew York · 🇺🇸 United States
Posted: 1 day agoSource: jobs.lever.co
AirflowAWSAzureCloudDockerGoogle Cloud PlatformHadoopKubernetesPythonPyTorchSparkSQL+1 more
Socure

Senior Backend Engineer, Machine Learning

Socure
Seniorfull-time$170k–$185k / year🇺🇸 United States
Posted: 14 days agoSource: jobs.ashbyhq.com
AirflowApacheAWSGoJavaKafkaPyTorchScikit-LearnSparkTensorflow