
AI Production Engineer
Distyl AI
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • New York • United States
Visit company websiteExplore more
Salary
💰 $130,000 - $250,000 per year
Tech Stack
About the role
- Own the performance and reliability characteristics of AI systems deployed in customer environments
- Design, build, and operate low-latency AI services—including real-time voice and interaction pipelines—as well as large-scale batch processing workflows that execute complex AI workloads reliably
- AI Production Engineers are the escalation point for performance and reliability risk, and have veto power on launches that violate production constraints
- Deeply involved in system design, implementation, and operation, investigating performance bottlenecks, failure modes, and scaling limits across AI pipelines, APIs, orchestration layers, and infrastructure
- Design and evolve observability systems—metrics, logs, tracing, alerts—that make AI behavior understandable and actionable in production
- Work directly with Forward Deployed AI Engineers, Product Engineers, and Architects to ensure that production constraints meaningfully shape system design
- Step in on high-risk or high-impact issues, debug live systems, and harden AI services so they can operate continuously under real-world load
- Help turn one-off production solutions into reusable patterns and platform capabilities, raising the overall production bar for Distyl’s AI systems over time
Requirements
- 3+ years of software engineering experience
- Deep Production Engineering Experience: Built and operated high-scale systems—low-latency APIs, streaming pipelines, real-time services, or large batch processing systems—and can reason deeply about performance, throughput, and reliability. Experience with real-time voice systems is a strong plus
- Strong Systems and Backend Fundamentals: Write high-quality production code and understand distributed systems concepts such as concurrency, fault tolerance, backpressure, and graceful degradation. You are comfortable optimizing systems under tight latency and throughput constraints
- Operational Excellence Mindset: Treat observability, instrumentation, and incident response as first-class concerns. Logging, metrics, tracing, alerting, and on-call readiness are integral to how you design and operate systems
- Ownership of AI Systems in Production: Take responsibility for AI systems end-to-end—design, deployment, monitoring, and ongoing health. When something breaks, you care about understanding why, fixing it properly, and preventing recurrence
- AI-Native Working Style: Actively use AI tools to debug systems, analyze performance data, explore designs, and automate operational workflows
Benefits
- 100% covered medical, dental, and vision for employees and dependents
- 401(k) with additional perks (e.g., commuter benefits, in‑office lunch)
- Access to state‑of‑the‑art models, generous usage of modern AI tools, and real‑world business problems
- Ownership of high‑impact projects across top enterprises
- A mission‑driven, fast‑moving culture that prizes curiosity, pragmatism, and excellence
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AI systemslow-latency APIsstreaming pipelinesreal-time serviceslarge batch processing systemsdistributed systemsperformance optimizationobservabilityinstrumentationincident response
Soft Skills
ownershipproblem-solvingdebuggingcollaborationattention to detailoperational excellenceresponsibilityanalytical thinkingadaptabilitycommunication