
Lead Observability Engineer
Vivun
full-time
Posted on:
Location Type: Remote
Location: Remote • California • 🇺🇸 United States
Visit company websiteSalary
💰 $185,000 - $205,000 per year
Job Level
Senior
Tech Stack
GrafanaPrometheus
About the role
- Own the end-to-end observability strategy for Ava, defining the standards, tools, and patterns that ensure reliable visibility across infrastructure and agentic components.
- Design and implement correlation models that link agent behavior, LLM interactions, and SaaS telemetry into cohesive, actionable insights.
- Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform (e.g., Observe, Datadog, or equivalent).
- Collaborate with engineering and QA to embed observability best practices into development workflows, CI/CD, and quality gates.
- Establish enablement frameworks—documentation, dashboards, and templates—that make observability self-serve for all engineering teams.
- Partner with teammates to ensure observability aligns with infrastructure reliability, alerting, and incident response patterns.
- Contribute to performance and reliability strategy, helping define how we measure agent quality, responsiveness, and system scalability.
Requirements
- 6+ years of experience in SRE, DevOps, or Observability Engineering roles, with at least 2+ years leading or designing observability initiatives.
- Deep knowledge of observability tooling (e.g., OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe, etc.) and distributed tracing practices.
- Experience with Agentic / LLM-based systems, including tools like LangChain, Celery, OpenAI APIs, or similar orchestration frameworks.
- Strong understanding of how to instrument, trace, and correlate AI/LLM workflows with infrastructure-level telemetry.
- Proven ability to define cross-team standards, influence engineering culture, and establish scalable monitoring patterns.
- Strong collaboration and communication skills—you enable, not dictate.
Benefits
- Competitive salary and full health benefits
- Stock Options at a well funded, pre-IPO company on a fast growth track
- Flexible work schedules and work from anywhere at a fully remote company
- Unlimited PTO with two weeks designated as “quiet period” each year
- An experienced team who will fight beside you in the trenches to accomplish your goals
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
observability strategycorrelation modelsdistributed tracinginstrumentationmonitoring patternsSaaS telemetryagent behavior analysissystem scalabilityperformance measurementreliability engineering
Soft skills
collaborationcommunicationinfluenceleadershipdocumentationenablementcross-team standardsengineering culturebest practicesself-service frameworks