Site Reliability Engineer, Infrastructure

Instructure

Site Reliability & Infrastructure Engineer building SRE practices for custom solutions at Instructure. Focusing on operational excellence and performance in cloud infrastructure management.

Posted 6/30/2026full-timeBudapest • 🇭🇺 HungaryMid-LevelSenior💰 HUF 1,000,000 - HUF 1,450,000 per monthWebsite

Tech Stack

Tools & technologies

AWSAzureCloudDockerGrafanaJenkinsKubernetesPrometheusRubyTerraform

About the role

Key responsibilities & impact

Design & Implement Observability: Build and manage a centralized monitoring, logging, and alerting strategy for all custom integrations.
Define Service Level Objectives (SLOs): Work with stakeholders to define SLOs and Service Level Indicators (SLIs) that align with customer expectations and business impact.
Architect for Reliability: Partner with Solution Architects and Developers to establish and enforce best practices for integration architecture, ensuring solutions are built for scalability, resiliency, and performance from day one.
Troubleshoot & Remediate: Serve as the primary escalation point for critical incidents related to custom integrations. Lead troubleshooting efforts and perform hands-on development work to resolve complex, high-stakes issues.
Manage Integration Infrastructure: Own the cloud infrastructure (AWS, Azure, etc.) that hosts our custom solutions, focusing on security, cost-optimization, and scalability.
Champion Infrastructure as Code (IaC): Partner with our core engineering team to align on best practices and systems to manage deployments and infrastructure.
Automate Everything: Develop and manage CI/CD pipelines for the safe and efficient deployment of integration code and infrastructure changes.
Bridge Team Gaps: Create and document clear operational handoffs and processes between teams to ensure a seamless flow from development to production support.
Lead Post-Mortems: Foster a blameless post-mortem culture to analyze incidents, identify root causes, and drive actionable improvements to prevent recurrence.
Create a Knowledge Hub: Develop runbooks, architectural diagrams, and best-practice guides to empower all of Professional Services with the knowledge to better support our solutions.

Requirements

What you’ll need

You have experience in a Site Reliability Engineering (SRE), DevOps, or Cloud Infrastructure role.
Expertise with AWS.
Hands-on with Infrastructure as Code (Terraform, CloudFormation).
Experience with observability platforms (Datadog, New Relic, Prometheus, Grafana).
Proficient in at least one scripting or programming language, preferably Ruby.
Experience building and managing CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins).
A systems-thinker with a passion for troubleshooting complex problems and improving processes.
Experience working within a client-facing Professional Services or technical consulting organization.
Background in managing the reliability of APIs, middleware, and complex data integrations.
Experience with containerization and orchestration (Docker, Kubernetes).
Bachelor's degree in Computer Science or a related technical field.

Benefits

Comp & perks

Competitive compensation, plus all full-time employees participate in our ownership program - because everyone should have a stake in our success.
Flexible work culture. Our remote, hybrid and in-office collaboration spaces vary by role, team and location.
Generous time off, including local holidays and our annual “Dim the Lights” period in late December, when teams are encouraged to step back and recharge based on departmental needs.
Comprehensive wellness programs and mental health support
Learning and development resources, including professional development tools and tuition reimbursement, to support your growth
The technology and tools you need to do your best work
Motivosity employee recognition program
A culture rooted in inclusivity, support, and meaningful connection

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringCloud Infrastructure ManagementScripting (Ruby)Troubleshooting Complex ProblemsAPI Reliability Management

Soft Skills

Systems ThinkingProcess ImprovementCollaboration