FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPython
About the role
Key responsibilities & impact- Implement and maintain robust automation for deploying and operating Kong's Managed Gateways across various cloud environments.
- Monitor system health, performance, and uptime, striving for 99.99% availability for our core infrastructure.
- Resolve complex production incidents efficiently, participating actively in on-call rotations to maintain service continuity.
- Build resilient tools and systems that enhance the overall reliability and operational efficiency of our platform.
- Contribute proactively to the prevention of technical debt, ensuring sustainable and scalable operations as Kong grows.
- Collaborate closely with engineering teams to design, review, and implement resilient and highly scalable services.
Requirements
What you’ll need- 2+ years of experience applying Site Reliability Engineering (SRE) principles and practices in a production environment.
- Proficiency in at least one of Golang or Python for automation, tooling, and infrastructure as code.
- Hands-on experience with Kubernetes and major cloud platforms such as AWS, GCP, or Azure.
- Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, Datadog).
- Solid understanding of networking concepts, distributed systems, and API gateways.
Benefits
Comp & perks- Flexible work arrangements
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringGolangPythonKubernetesAWSGCPAzuremonitoring toolslogging toolsalerting tools
Soft Skills
collaborationproblem-solvingproactive preventionincident resolutioncommunication
