AI Engineer, Prompt Engineering, Python

Synthflow AI

full-time

Posted on: 9/17/2025

Origin: • 🌎 Anywhere in the World

Visit company website

✨ AI Apply

Apply

Job Level

Mid-LevelSenior

Tech Stack

GrafanaPythonSQL

About the role

Design & iterate prompts (system, tool/function-calling, task prompts) to boost voice AI agent success, reliability, and tone.
Build co-pilots for customers to author their own prompts: meta-prompted assistants that suggest structures, lint for risks, autocomplete tool schemas, critique drafts, and generate eval cases.
Work directly with customer feedback and conversation logs to identify failure modes; translate them into prompt changes, guardrails, and data improvements.
Build eval datasets (success labels, rubrics, edge cases, regressions) and run offline/online evaluations (A/B tests, canaries) to quantify impact.
Create Python utilities/services for prompt versioning, config-as-code, rollout/rollback, and guardrails (policies, refusals, redaction).
Partner with PM/Success to define success metrics (task completion, first-pass accuracy, cost, latency) and instrument dashboards/alerts.
Own LLM integration details: function/tool schemas, output parsing/validation (pydantic), retrieval-aware prompting, and fallback strategies.
Ensure privacy & compliance (PII handling, anonymization, regional data boundaries) in datasets and logs.
Share learnings via concise docs, playbooks, and internal demos.
Run a tight feedback loop with customers, turn real conversations into better prompts and eval datasets, and ship changes that measurably improve agent outcomes.

Requirements

Python: 3+ years writing clean, tested, production code (typing, pytest, profiling); experience building small services/APIs (FastAPI preferred).
Prompt Engineering: Hands-on experience designing system/tool prompts, meta-prompting, rubric graders, and iterative prompt tuning based on real user data.
LLM Integration: Comfortable with major APIs (OpenAI/Anthropic/Google/Mistral), function/tool calling, streaming, and robust output handling.
Evaluation Mindset: Ability to define measurable success, create labeled datasets, and run methodical experiments/A/B tests.
Product Sense: Comfortable talking with customers, turning qualitative feedback into shipped improvements.
Data Hygiene: Practical experience cleaning, labeling, and balancing datasets; awareness of privacy/PII constraints.
Nice-to-haves: Experience building prompt-authoring UIs/SDKs or internal tooling for prompt versioning and governance.
Nice-to-haves: Agentic frameworks & tooling: DSpy, MCP, LangGraph, LlamaIndex, Rasa; experience with agent/tool schemas and orchestration.
Nice-to-haves: Observability & eval tooling: Langfuse, LangSmith, Braintrust; building eval harnesses and experiment dashboards.
Nice-to-haves: RAG & vector stores: Qdrant/Weaviate/Pinecone and retrieval-aware prompting.
Nice-to-haves: Experimentation workflows: A/B testing, prompt diffing/versioning.
Nice-to-haves: Infra & analytics: light SQL/log analysis, metrics & tracing, simple Grafana/OTel dashboards.
Nice-to-haves: Writing public blog posts or talks about applied LLM techniques.

AI Engineer, Prompt Engineering, Python

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Senior Platform Telemetry Engineer

Senior Staff Engineer – AI In-Market Engineering

Senior Staff Engineer – AI In-Market Engineering

Senior Customer Architect

DevOps Engineer