FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAnsibleAWSAzureCloudGrafanaJavaScriptLinuxNode.jsPrometheusSplunkTerraformTypeScript
About the role
Key responsibilities & impact- Engage with teams and improve service delivery and reliability across their entire lifecycle
- Measure and monitor all production systems with an eye towards availability, latency and overall system health
- Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence
- Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability
- Help identify and drive down toil with creative innovation and automation
- This position will require stand-by, on-call, or off-hours duties
Requirements
What you’ll need- Proven experience designing, implementing, and operating observability systems for complex cloud-based platforms
- Experience with Configuration Management and Infrastructure as a Code Tools like Terraform (preferred) or Ansible
- Knowledge of cloud platforms (prefer AWS and Azure)
- Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
- Extensive experience with enterprise scale continuous delivery environments
- Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment
- Experience with sustainable incident response in a blameless environment
- Background in Linux Systems Engineering
- Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
- Comfortable with a high level of autonomy and working with a distributed team
- Knowledge of Cloud and application security best practices
- Strong knowledge of cloud design patterns for scale, data management, resiliency, etc.
- A love for high quality and a knack for testing
- Opinions about business metrics, and SLOs
Benefits
Comp & perks- Diversity drives innovation and better decisions
- Remote-first culture
- Welcoming and valuing differences
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
observability systemsConfiguration ManagementInfrastructure as CodeTerraformAnsiblecloud platformsAWSAzureJavaScriptNode.js
Soft Skills
service deliveryoperational excellencecreative innovationautonomydistributed team collaborationincident responseblameless environmenthigh qualitytestingbusiness metrics
