
Lead, Site Reliability Engineer
Royal Caribbean Group
full-time
Posted on:
Location Type: Office
Location: United States
Visit company websiteExplore more
Job Level
About the role
- The Lead Site Reliability Engineer (Lead SRE) will report to the SRE Manager in support of the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making.
- The Lead SRE will use application and user performance metrics collected from various sources and tools to support tasks such as initial triage of critical production incidents, bug analysis, implementation of best practices in site reliability engineering, infrastructure optimization, and seamless collaboration between internal teams and external service providers, among other operational initiatives.
- Provides leadership over a large team of Level 1 and Level 2 support resources.
- Is responsible for the Incident Management, Application Performance, Configuration Management and Operational Readiness of the products within her/his ownership.
- Partners with and collaborate closely with stakeholders from the various teams within IT to ensure that performance tools, configuration tools and monitoring tools meet the needs of her/his products.
- Responsible for a team of resources prepared to react quickly to production incidents with the goal to restore systems/applications back to normal service operation as quickly as possible and minimize the impact on guest/crew experience or business operations.
- Ensures the proactive monitoring and management of performance and availability of the software applications within the products s/he is responsible for.
- Leads the team(s) in implementing and maintaining the technology standards and practices across product definition and product configuration.
Requirements
- 10+ years in Site Reliability Engineering (SRE), DevOps, or a related IT operations role
- Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or other relevant advanced degree preferred.
- At least 3 years of experience managing teams and collaborating with external service providers.
- Proficiency in cloud platforms such as AWS, AWS Elastic Beanstalk.
- Understanding of API design principles: REST, SOAP, Graph.
- Advanced knowledge of monitoring and logging tools (AppDynamics, Datadog, Splunk, New Relic, etc.).
- Strong proficiency in Adobe AEM is crucial for guiding technical initiatives and mentoring teams.
Benefits
- NA 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpscloud platformsAWSAPI design principlesRESTSOAPGraphmonitoring toolslogging tools
Soft Skills
leadershipcollaborationincident managementteam managementcommunication