FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff Site Reliability Engineer
ScalePadStaff Site Reliability Engineer managing production infrastructure across AWS and Azure for ScalePad. Fostering engineering culture and leading initiatives in reliability and developer experience.
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsKubernetesTerraform
About the role
Key responsibilities & impact- Own production infrastructure across AWS and Azure, including networking, IAM, and cost.
- Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable.
- Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements.
- Operate and improve CI/CD pipelines that the entire engineering org depends on.
- Operationalize SLO/SLI frameworks and observability practices alongside the SRE team.
- Own incident response practice, on-call tooling, and incident review follow-through.
- Reduce operational toil through automation across secret rotation, access management, and environment provisioning.
- Execute on capacity planning, disaster recovery, and resilience work across critical systems.
- Build and maintain internal developer tooling that removes friction across engineering.
- Lead rollouts of AI-native tooling for code review, testing, and engineering productivity, e.g., CodeRabbit, Copilot-class assistants, and internal AI workflows.
- Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems.
- Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks, and align infrastructure and reliability investments with business goals.
- Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization.
- Lead post-mortems and continuous improvement initiatives to strengthen reliability practices.
- Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency.
- Drive standardization and modernization efforts across infrastructure and operational practices.
- Lead proof-of-concept and experimentation initiatives to validate new reliability solutions.
Requirements
What you’ll need- 8+ years of experience in software engineering, infrastructure, or related technical disciplines, with at least 5 years focused on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
- Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices.
- Experience designing and operating highly available, scalable production systems.
- Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices.
- Experience implementing SLO/SLI frameworks and reliability engineering methodologies.
- Incident management, troubleshooting, and on-call experience in complex production environments.
- Proven ability to lead large-scale technical initiatives across multiple teams.
- Track record of cross-team technical influence without formal authority, excellent communication and collaboration skills with both technical and non-technical stakeholders.
- Passion for mentoring engineers and improving engineering culture.
- Demonstrated ability to thoughtfully integrate AI-assisted tooling into engineering and operational workflows to improve efficiency, reliability, and developer experience.
Benefits
Comp & perks- Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
- Parental leave programs are in place to support you and your family when it matters most.
- Join opt-in mentorship programs and learn directly from founders and senior leaders who’ve scaled multiple SaaS ventures and spent decades in the MSP industry.
- Access an annual professional development budget to level up your skills, your career, and your impact.
- Work with brand new, top-of-the-line hardware and equipment so you can do your best work, whether you’re at home or in one of our hubs.
- Receive a monthly stipend to help you create an effective hybrid or remote work environment.
- Take care of yourself with 100% employer-paid benefits.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSAzureTerraformKubernetesCI/CDSLOSLIscriptingautomationinfrastructure as code
Soft Skills
leadershipmentoringcommunicationcollaborationtroubleshootingincident managementcross-team influencecontinuous improvementknowledge-sharingpassion for engineering culture