RBC

Lead Site Reliability Engineer

RBC

full-time

Posted on:

Location Type: Office

Location: Toronto • 🇨🇦 Canada

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSAzureCloudDNSGrafanaKubernetesNGINXOpenShiftSplunk

About the role

  • Serve as the primary operational support for the Apigee API Gateway platform, ensuring its reliability, availability, and performance
  • Assist application teams in troubleshooting and resolving Apigee-related issues, including API design, security, and performance optimization
  • Manage API lifecycle, including OpenAPI/Swagger specifications, rate limiting, throttling, quota management, and OAuth2.0/JWT authentication
  • Build and maintain tools to automate operational processes, including monitoring, logging, and alerting
  • Develop and implement SRE solutions to improve system reliability, scalability, and performance
  • Continuously evaluate and optimize system performance using observability tools like Dynatrace, Splunk, Elastic, and Grafana
  • Partner with development teams to improve services through rigorous testing, release procedures, and capacity planning
  • Provide technical leadership by conducting code reviews, publishing technical designs, and mentoring team members
  • Drive SRE adoption and transformation by organizing engineering mindset meetups and sharing best practices
  • Monitor system health holistically and proactively identify areas for improvement
  • Lead incident management and root cause analysis for production issues, ensuring lessons learned are applied to prevent recurrence
  • Maintain compliance and technology currency, including server patching, certificate renewals, and segregation of duties

Requirements

  • Production support experience with infrastructure technologies, including API Gateway platforms like Apigee, Kong, Nginx, or AWS/Azure API Management
  • Strong expertise in API security (OAuth2.0, JWT), API design (OpenAPI/Swagger), and developer portal management
  • Experience as an SRE supporting cloud and legacy applications
  • Hands-on experience with cloud technologies such as OpenShift, Kubernetes, and Azure Kubernetes Service (AKS)
  • Proficiency in observability tools (Dynatrace, Splunk, Elastic, Grafana) and end-to-end application monitoring
  • Solid understanding of networking concepts, including certificates, load balancers, and DNS
  • A proactive approach to identifying and solving problems, with a strong focus on automation and innovation
Benefits
  • A comprehensive Total Rewards Program including bonuses and flexible benefits
  • Competitive compensation
  • Commissions and stock where applicable
  • Leaders who support your development through coaching and managing opportunities
  • Ability to make a difference and lasting impact
  • Work in a dynamic, collaborative, progressive, and high-performing team
  • Flexible work/life balance options
  • Opportunities to do challenging work
  • Opportunities to take on progressively greater accountabilities
  • Opportunities to building close relationships with clients

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
API designAPI securityOAuth2.0JWTOpenAPISRE solutionsautomationnetworking conceptscloud technologiesobservability
Soft skills
problem solvingtechnical leadershipmentoringcollaborationproactive approach