Salary
💰 $160,000 - $210,000 per year
Tech Stack
Distributed SystemsGrafanaJavaScriptJenkinsNode.jsPrometheusReactSQLTypeScript
About the role
- Design and build sophisticated alerting systems that enable proactive monitoring and incident detection across distributed systems
- Develop query-based alert rules and expressions using PromQL, SQL, and other query languages to surface meaningful insights
- Create intelligent alert routing, deduplication, and correlation mechanisms to reduce noise and improve signal quality
- Build scalable backend services for alert evaluation, notification delivery, and alert management workflows
- Optimize time-series data storage and query performance for high-volume metrics and telemetry data
- Develop intuitive interfaces for alert configuration, visualization, and management using React and modern frontend technologies
- Collaborate with cross-functional teams to understand monitoring requirements and deliver comprehensive alerting solutions
- Mentor and guide engineers on best practices for observability and alerting architecture
Requirements
- Strong proficiency in TypeScript/Node.js with a proven track record of building production-grade services
- Experience with query languages for metrics and monitoring (PromQL, SQL, or similar) and ability to write complex queries for data analysis
- Hands-on experience building or maintaining alerting systems, including rule evaluation engines and notification pipelines
- Experience with time-series databases and columnar storage systems (ClickHouse experience is a plus)
- Frontend development skills with React and modern JavaScript frameworks for building data visualization and management interfaces
- Strong understanding of distributed systems, data structures, and algorithms
- Experience with observability concepts including metrics, logs, traces, and their correlation
- Ability to work independently with minimal supervision and a track record of learning quickly
- Dedication to writing clean, maintainable, and well-tested code
- Experience Prometheus ecosystem, including AlertManager
- Background in building rule engines or expression evaluation systems
- Experience with notification systems and integrations (PagerDuty, Slack, webhooks, etc.)
- Familiarity with observability tools like Grafana, ELK stack, or similar solutions
- Experience with CI/CD pipelines such as BitBucket, Jenkins, CircleCI, etc.
- Understanding of alert fatigue mitigation strategies and intelligent alerting patterns
- Experience with high cardinality data and performance optimization
- Willingness to speak your mind and share ideas
- Appreciation for humor and a love for goats
- Comfort working remotely
- health, dental, vision, short-term disability, and life insurance
- paid holidays and paid time off
- fertility treatment benefit
- 401(k)
- equity
- eligibility for a discretionary company-wide bonus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
TypeScriptNode.jsPromQLSQLReacttime-series databasesClickHouseobservability conceptsrule enginesnotification systems
Soft skills
independent workquick learningclean codementoringcommunicationcollaborationproblem-solvingdedicationadaptabilitycreativity