Senior Software Engineer

Horizon3.ai

Senior Software Engineer developing an autonomous web application penetration tester for Horizon3.ai. Collaborating with cybersecurity experts to enhance autonomous security operations.

Posted 6/23/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $169,000 - $208,000 per yearWebsite

Tech Stack

Tools & technologies

AWSPython

About the role

Key responsibilities & impact

Build and evolve the agent harness and orchestration that turns an LLM into a reliable autonomous pentester, the loop that reasons over an application, forms attack hypotheses, acts, and verifies results.
Design the tools and tool-shaped feedback the agent uses to probe and exploit, and the structured-output and validation layers that keep it reliable (e.g., hook-enforced mandatory validation, schema-constrained outputs).
Translate the team's offensive expertise into repeatable agent capabilities — partnering directly with our attackers to encode how they think into something the agent can do consistently.
Own and grow our evaluation infrastructure: benchmark suites, a failure-mode taxonomy across the pipeline (discovery → hypothesis → exploitation → verification), and regression detection, so we actually know whether the agent is getting better.
Manage LLM inference in production: model selection, prompt and context engineering, and keeping cost and latency under control (we run on AWS Bedrock with centralized cost tracking).
Hold the line on production-safety and no-false-positives, every finding the agent reports has to be real and reproducible.

Requirements

What you’ll need

5+ years building production software, with strong Python.
Hands-on experience building LLM-powered applications or agents, tool use / function calling, structured outputs, multi-step orchestration, and the glue that makes it all hold together.
A track record of making LLMs reliable in production, you've wrestled nondeterminism, designed around model limitations, and shipped something that worked when it mattered.
Real experience with evaluation: you've built or owned the harness that tells you whether a model or agent change is an improvement, not just a vibe.
Strong instincts for prompt and context engineering, and the judgment to keep the model's job small and well-scoped.
Solid software fundamentals — testing, observability, and the discipline to keep a complex agent debuggable.
Ownership mentality, comfortable owning a critical, fast-moving subsystem end to end.

Benefits

Comp & perks

Health, vision & dental insurance for you and your family
Flexible vacation policy
Generous parental leave
Growth Opportunities: Be part of a dynamic and growing team with numerous career development opportunities.
Inclusive Team: We value diversity and promote an inclusive culture where everyone can thrive.
Innovative Culture: Work in a collaborative environment that encourages creativity and out-of-the-box thinking.
Hybrid & Remote Work: We embrace a mix of remote and hybrid work models depending on role and location, including our Chicago office, where some roles require regular in-office presence.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonLLM-powered applicationstool usefunction callingstructured outputsmulti-step orchestrationprompt engineeringcontext engineeringtestingobservability

Soft Skills

ownership mentalitystrong instinctsjudgmentdiscipline