FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
Omilia - Conversational IntelligenceSenior Site Reliability Engineer maintaining production clusters and developing observability solutions. Collaborate with teams to ensure platform reliability and performance using automation and monitoring tools.
Tech Stack
Tools & technologiesAnsibleAWSCloudDockerGoGrafanaKubernetesLinuxMySQLNoSQLPostgresPrometheusPythonRDBMSRedisTCP/IPTerraformVoIP
About the role
Key responsibilities & impact- - Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
- - First response for incidents, contribute to problem management and root cause analysis.
- - Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.
- - Develop troubleshooting documentation for production support resources.
- - Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.
- - Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
- - Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.
- - Participate in on-call rotations and continuously improve alert quality and response processes.
- - Champion a culture of reliability, performance, and continuous improvement across teams.
Requirements
What you’ll need- - Bachelor's Degree or MS in Engineering or equivalent.
- - Experience in operating at least one container orchestration cluster (Kubernetes, Docker Swarm).
- - Experience developing or maintaining software for production services at scale.
- - Experience with ELK.
- - Experience with AWS.
- - Experience with Grafana/Prometheus stack.
- - Strong scripting skills (Bash, Python or Go).
- - Excellent communication skills.
- - Thinking out of the box and anticipating challenges. It is imperative we are not simply reactive; we must expect challenges and question technologies, procedures and thinking already in place. You will be expected to constantly review and challenge at all levels.
- - Versatility. We work with agile/lean methods. We'd much rather iterate and learn than assume we know all the answers.
- - Being a team player. You don't (always) work in isolation and are excited by the thought of using your team whilst involving product, experience design, engineering, and more in the process.
- **Will be considered as a plus:**
- - Telephony knowledge (SIP, VoIP);
- - Experience in Linux Administration (RedHat, CentOS, AL);
- - Working knowledge in Configuration Management tools (Terraform, Ansible);
- - Experience with TCP/IP and general networking concepts;
- - RDBMS knowledge (MySQL, Postgres);
- - NoSQL knowledge (Redis).
Benefits
Comp & perks- - Fixed compensation;
- - Long-term employment with the working days vacation;
- - Development in professional growth (courses, training, etc);
- - Being part of successful cutting-edge technology products that are making a global impact in the service industry;
- - Proficient and fun-to-work-with colleagues;
- - Apple gear.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesDocker SwarmELKAWSGrafanaPrometheusBashPythonGoLinux Administration
Soft Skills
excellent communicationthinking out of the boxanticipating challengesversatilityteam player
Certifications
Bachelor's DegreeMS in Engineering