
Lead Site Reliability Engineer
RBC
full-time
Posted on:
Location Type: Office
Location: Toronto • 🇨🇦 Canada
Visit company websiteJob Level
Senior
Tech Stack
AWSAzureCloudDNSGrafanaKubernetesNGINXOpenShiftSplunk
About the role
- Serve as the primary operational support for the Apigee API Gateway platform, ensuring its reliability, availability, and performance
- Assist application teams in troubleshooting and resolving Apigee-related issues, including API design, security, and performance optimization
- Manage API lifecycle, including OpenAPI/Swagger specifications, rate limiting, throttling, quota management, and OAuth2.0/JWT authentication
- Build and maintain tools to automate operational processes, including monitoring, logging, and alerting
- Develop and implement SRE solutions to improve system reliability, scalability, and performance
- Continuously evaluate and optimize system performance using observability tools like Dynatrace, Splunk, Elastic, and Grafana
- Partner with development teams to improve services through rigorous testing, release procedures, and capacity planning
- Provide technical leadership by conducting code reviews, publishing technical designs, and mentoring team members
- Drive SRE adoption and transformation by organizing engineering mindset meetups and sharing best practices
- Monitor system health holistically and proactively identify areas for improvement
- Lead incident management and root cause analysis for production issues, ensuring lessons learned are applied to prevent recurrence
- Maintain compliance and technology currency, including server patching, certificate renewals, and segregation of duties
Requirements
- Production support experience with infrastructure technologies, including API Gateway platforms like Apigee, Kong, Nginx, or AWS/Azure API Management
- Strong expertise in API security (OAuth2.0, JWT), API design (OpenAPI/Swagger), and developer portal management
- Experience as an SRE supporting cloud and legacy applications
- Hands-on experience with cloud technologies such as OpenShift, Kubernetes, and Azure Kubernetes Service (AKS)
- Proficiency in observability tools (Dynatrace, Splunk, Elastic, Grafana) and end-to-end application monitoring
- Solid understanding of networking concepts, including certificates, load balancers, and DNS
- A proactive approach to identifying and solving problems, with a strong focus on automation and innovation
Benefits
- A comprehensive Total Rewards Program including bonuses and flexible benefits
- Competitive compensation
- Commissions and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Opportunities to building close relationships with clients
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
API designAPI securityOAuth2.0JWTOpenAPISRE solutionsautomationnetworking conceptscloud technologiesobservability
Soft skills
problem solvingtechnical leadershipmentoringcollaborationproactive approach