DevOps Engineer

• Maintain and improve the reliability, scalability, and performance of our Java-based application.
• Responsible for managing and monitoring the applications and infrastructure.
• Use the Grafana stack (Grafana, Loki, Prometheus) to ensure a high level of observability.
• Implement robust monitoring, alerting, and logging solutions.
• Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment.
• Troubleshoot and resolve complex issues in production and non-production environments.
• Participate in both pre- and post-deployment performance testing and monitoring efforts.
• Optimize Java application performance, ensuring efficient resource utilization and scaling.
• Deploy and manage the Grafana stack to provide real-time monitoring, logging, and alerting.
• Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance.
• Support the operations team’s incident response efforts and participate in post-mortems.
• Document and share lessons learned from incidents.

Site Reliability Engineer

DevOps Engineer

Senior Site Reliability Engineer

Site Reliability Engineer

Senior DevOps Engineer

Staff Site Reliability Engineer, SRE – Platform Reliability