FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaITSMPythonServiceNowSplunkTerraform
About the role
Key responsibilities & impact- Design and implement end-to-end observability solutions across applications, infrastructure, and cloud environments.
- Develop dashboards, alerts, and telemetry frameworks to provide real-time visibility into system health and performance.
- Build automation solutions to eliminate repetitive operational tasks and improve efficiency.
- Enable runbook automation, self-healing capabilities, and automated incident triage workflows.
- Define and implement SLIs, SLOs, and alerting strategies to improve service reliability.
- Drive improvements in MTTD and MTTR through actionable alerts and telemetry-driven insights.
- Implement proactive monitoring, anomaly detection, and predictive alerting to identify issues before customer impact.
- Leverage AIOps capabilities for alert correlation and intelligent incident response.
- Integrate observability platforms with CI/CD pipelines, cloud services, and ITSM tools such as ServiceNow.
- Collaborate with engineering, product, and operations teams to establish observability standards and operational readiness practices.
Requirements
What you’ll need- 3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains.
- Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry.
- Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures.
- Proficiency in Python for automation and operational tooling.
- Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems.
- Hands-on experience with Terraform and Infrastructure as Code practices.
- Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks.
- Excellent troubleshooting, communication, and collaboration skills.
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
Benefits
Comp & perks- Culture of Relentless Performance : join an unstoppable technology development team with a 99% project success rate and more than 30% year-over-year revenue growth.
- Competitive Pay and Benefits : enjoy a comprehensive compensation and benefits package, including health insurance, language courses, and a relocation program.
- Work From Anywhere Culture : make the most of the flexibility that comes with remote work.
- Growth Mindset : reap the benefits of a range of professional development opportunities, including certification programs, mentorship and talent investment programs, internal mobility and internship opportunities.
- Global Impact : collaborate on impactful projects for top global clients and shape the future of industries.
- Welcoming Multicultural Environment : be a part of a dynamic, global team and thrive in an inclusive and supportive work environment with open communication and regular team-building company social events.
- Social Sustainability Values : join our sustainable business practices focused on five pillars, including IT education, community empowerment, fair operating practices, environmental sustainability, and gender equality.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
observability engineeringsite reliability engineeringPythonTerraformInfrastructure as Codemetricslogseventsdistributed tracingAIOps
Soft Skills
troubleshootingcommunicationcollaboration
