FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAnsibleAWSCloudEC2ElasticSearchGrafanaKubernetesLogstashPrometheusPythonTerraform
About the role
Key responsibilities & impact- Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
- Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
- Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
- Use observability tools to analyze runtime behavior and make data-driven decisions that improve system performance and reliability.
- Partner with development teams to integrate reliability best practices into the software development lifecycle.
- Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes.
- Optimize resource utilization to reduce costs while maintaining service quality.
- Contribute to the development and adoption of AI-driven tools and practices for engineering and observability.
Requirements
What you’ll need- At least 6 years of experience as a SRE or DevOps.
- Strong experience with Observability Tools such as OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
- In-depth experience with Cloud Platforms: AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
- Strong experience working in Kubernetes environments, with a focus on Helm for deployment and configuration management
- Experience working with AI and LLM tools such as Cursor, Claude Code or similar.
- Proficiency in scripting and/or development languages such as Bash or Python.
- Thorough understanding of CI/CD pipelines and automation tools.
- Strong experience with automation tools like Terraform and/or Ansible, and understanding of Infrastructure as Code.
- Solid troubleshooting and debugging skills.
- A team player with a strong can-do mentality.
Benefits
Comp & perks- medical
- dental
- vision
- 401(k)
- generous vacation
- performance-based bonuses
- meals at the office
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
observabilityautomationmonitoringcapacity planningscriptingCI/CDinfrastructure as codetroubleshootingdebuggingresource optimization
Soft Skills
team playercan-do mentality
