
Senior Observability Engineer
Tealium
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $140,000 - $160,000 per year
Job Level
About the role
- Participate in rotating on-call approximately 20% of working time.
- Lead end-to-end observability design for all features in production and internal usage.
- Instrument features in Tealium products.
- Implement monitoring and cost tracking.
- Build open telemetry pipelines to track LLM request/response metrics, prompt engineering observability, token usage, hallucination detection, and failover.
Requirements
- 4+ years in Site Reliability Engineering and Observability Engineering with focus on production-grade 24X7X365 systems.
- Deep experience instrumenting services and applications for observability.
- Familiarity with prompt engineering, embeddings, vector DBs (Neptune), and RAG-style architectures.
- Hands-on experience with OpenTelemetry, Datadog, Sumologic, Prometheus, or similar.
- Experience integrating observability into AI platforms: e.g., Bedrock, Neptune, LangChain, LlamaIndex, HuggingFace, SageMaker, etc.
- Proficiency with Java, Python, Go, or similar languages.
- Experience with multiple AWS services.
- Strong background in Infrastructure-as-Code (Terraform, ArgoCD) and CI/CD tooling (Jenkins, GitHub Actions).
- Understanding of Kubernetes and container orchestration.
- Excellent collaboration skills and comfort leading across SRE, Data Engineering, and Product/ML teams.
- Experience mentoring or leading technical initiatives.
- Communication skills for explaining complex concepts to non-technical stakeholders.
Benefits
- Employees are eligible to receive an annual bonus and stock options.
- Employees and their families are eligible for medical, dental, vision, life, and disability insurance.
- Employees have the option to enroll in our 401k plan and are eligible to receive contributions for company matching.
- Employees are eligible for flexible paid time-off and extended paid parental leave.
- We offer 11 paid holidays annually.
- We offer 15 hours of paid work time for volunteer activities and programs.
- Our sick leave accrual is the following for our employees: Exempt CA employees (not including San Francisco) including NY : accrue 40 hours each year. Unused sick leave carries over into the next year. Employees cannot exceed 80 hours in a given year. Exempt Non - CA employees (not including NY) including SF: Accrue 1 hour every 30 hours worked. Cannot exceed 180 hours in the calendar year. Non-Exempt: accrue 1 hour every 30 hours worked. Unused carries over to the next year. Not to exceed 108 hours in a calendar year.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringObservability EngineeringOpenTelemetryJavaPythonGoInfrastructure-as-CodeTerraformCI/CDKubernetes
Soft Skills
collaborationleadershipmentoringcommunication