Site Reliability Engineer I

GM Financial

. The Site Reliability Engineer under the general direction from the leadership will assist in the day-to-day tasks critical to the team's success.

Posted 5/8/2026full-timeArlington • Texas • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureCloudDistributed SystemsGoogle Cloud PlatformJenkinsKubernetesLinuxOpen SourcePythonReactTerraform

About the role

Key responsibilities & impact

The Site Reliability Engineer under the general direction from the leadership will assist in the day-to-day tasks critical to the team's success.
The position will be responsible for supporting cloud infrastructure architecture and components, including hybrid cloud and Public Cloud platforms.
This will include prototyping, initiating, and operationalizing of Public Cloud solutions.
The role will also be supportive of overall Cloud Transformation initiatives designed to meet key goals in creating a service-driven culture through performance and delivery of SaaS, PaaS, and IaaS solutions by public cloud vendors such as Azure and AWS.
The Site Reliability Engineer will be responsible for configuration, efficiency, and performance of the deployed public cloud solutions.
The scope of the role includes not only cloud engineering, but advanced level automation capabilities, and even some overlap into software development disciplines.
Build and demonstrate a foundational understanding of SRE concepts, including observability, monitoring, incident response, and the core systems owned by the team.
Execute standard operational tasks independently using established processes, runbooks, tooling, and escalation paths; raise issues when scenarios become complex or unfamiliar.
Perform initial troubleshooting for clear production or environment issues with limited guidance; contribute findings and next steps to the broader resolution effort.
Demonstrate ownership of learning by seeking mentorship, asking questions, and contributing back to shared team knowledge.
Help teams apply SRE operational readiness practices using the SRE Checklist—with emphasis on detection/observability, performance, resiliency, automation, and operational readiness before go‑live.
Assist with defining and implementing basic monitoring coverage aligned to Golden Signals (e.g., latency, traffic, errors, saturation/capacity) and validate telemetry appears correctly in monitoring platforms.
Follow established standards for cloud based resources in Azure environment for automation and troubleshooting.
Support logging and exception-handling hygiene by aligning to known standards (e.g., ensuring correlation IDs and key dimensions are captured where required).
Assist and provide systems administration setup/configuration as needed for supported services and environments.
Contribute to toil reduction by helping implement/maintain repeatable operational mechanisms (e.g., health checks/probes and monitoring configuration) as defined in standards and patterns.

Requirements

What you’ll need

Thorough command of both the Windows and Linux Operating Systems, with strong background in troubleshooting either
Knowledge of native Kubernetes or related enterprise container platforms such as Open Shift
Good understanding of the mechanics of this platform and the deployment pipeline that feeds it
Knowledge of Public Cloud Governance frameworks, architectures, configurations, services, and solutions, specifically within Microsoft Azure, but may also include AWS and GCP
Knowledge in core Azure services like Azure Kubernetes Service, CosmoDB, Azure Functions, Azure Storage Entities and Concepts, Azure CLI and Powershell Cmdlets
Knowledge in Azure organizational entities such as Departments, Accounts, Subscriptions, Resource Groups and Management Groups
Strong automation skills in Linux and Windows including bash, python, and Powershell
Extensive experience with Terraform plans and associated development
Knowledge of Arm Templates and various related automation methods within Azure
Experience with modern source control repositories (e. g. Git) and devOps toolsets (Jenkins/ Ansible etc) and familiarity with Agile/ Scrum methodologies
Experience with cloud-native and microservice architectures and an understanding of design principles for scalability, performance, and reliability
Experience with distributed systems, asynchronous messaging, and networking protocols
Experience with open source applications, frameworks, and libraries
Fast learner; proactive thinker
Ability to innovate, automate, and continually improve processes
Excellent verbal and written communication skills
Possess critical thinking and analytical skills
Capacity to take initiative; desire to become a self-starter
Willingness to find problems and come up with creative solutions
Ability to balance priorities in order to meet multiple requirements and deadlines while ensuring priority objectives receive proper emphasis
Ability to accept change and adapt to shifting priorities
Effective time management and prioritization skills
Able to think and react positively and professionally when faced with obstacles
A strong willingness to learn, and accept instruction
Advanced job related certifications preferred but not required
Exposure to Golden Signals–based monitoring (latency, traffic, errors, saturation) and the discipline of validating telemetry and alert behavior preferred
Exposure to reliability engineering concepts such as SLOs/SLIs and how reliability goals connect to real production operations preferred
Familiarity with cloud and runtime fundamentals (e.g., Windows/Linux basics and cloud platform exposure such as Azure) preferred
Familiarity with modern engineering ways of working that support reliability outcomes (e.g., documentation habits, continuous improvement mindset in a DevOps culture) preferred
Azure cloud environment preferred
.net Coding preferred
3-5 years of experience in cloud computing, DevOps, and all related automation disciplines preferred
High School Diploma or equivalent required
Bachelor’s Degree in related field or equivalent work experience within the IT field required.

Benefits

Comp & perks

Generous benefits package available on day one to include: 401K matching
bonding leave for new parents (12 weeks, 100% paid)
tuition assistance
training
GM employee auto discount
community service pay
nine company holidays.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Windows Operating SystemLinux Operating SystemKubernetesAzureAWSTerraformPythonBashPowershellDevOps

Soft Skills

critical thinkinganalytical skillstime managementcommunication skillsinitiativeadaptabilityproactive thinkingproblem-solvingownership of learningability to balance priorities

Certifications

Bachelor’s Degree in related fieldAdvanced job related certifications