SRE/DevOps

nDeavour Consulting

Site Reliability Engineer ensuring health, performance, and delivery of infrastructure systems at Mobile Wave Solutions. Working collaboratively with engineers to automate processes and improve operational reliability.

Posted 6/22/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AWSAzureCloudDockerGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

About the role

Key responsibilities & impact

Own the infrastructure. Build, maintain, and scale the systems our product runs on, with reliability and cost-efficiency as first-class concerns
Measure the non-functional. Define and track SLIs, SLOs, and error budgets for availability, latency, throughput, and scalability. Make system behavior visible and quantifiable to the whole team
Automate relentlessly. Identify and eliminate toil. Replace manual operational work with code, infrastructure-as-code, and self-healing systems
Build a seamless delivery process. Design, maintain, and improve CI/CD pipelines so engineers can ship safely and frequently with fast feedback
Collaborate across functions. Partner closely with software engineers and QA to embed reliability and quality early - through testing strategy, deployment practices, and shared ownership of production
Apply AI to operations. Use AI/AIOps to automate remediation, surface anomalies, reduce alert noise, and improve the signal quality of monitoring, reporting, and on-call
Lead incident response. Drive blameless postmortems and turn incidents into systemic improvements
Set the standard. Define reliability and operational best practices, and mentor engineers across the team to raise the bar

Requirements

What you’ll need

5+ years of experience in an SRE, DevOps, or Platform Engineering role, with a track record of owning systems end to end
Strong grasp of reliability engineering fundamentals: SLIs/SLOs, error budgets, and reducing toil
Hands-on experience designing modernising and operating CI/CD pipelines
Solid infrastructure-as-code skills (Terraform / Pulumi / CloudFormation)
Experience with cloud platforms (GCP / AWS / Azure) and container orchestration (Kubernetes / Docker)
Proficiency in at least one programming/scripting language for automation (Python / Go / Bash)
Strong observability experience: metrics, logging, tracing, and alerting (NewRelic, Prometheus / Grafana / Datadog / OpenTelemetry)
Applied experience using AI/AIOps to automate, measure, report, and alert - anomaly detection, intelligent alerting, noise reduction, or automated remediation
A collaborative mindset and comfort working alongside engineers and QA toward shared reliability goals
Demonstrated technical leadership—mentoring engineers, driving cross-team initiatives, and influencing engineering practices

Benefits

Comp & perks

Remote Office – Option to work remotely or hybrid
Parking Space – Free parking available
Fun Office Space – Game zone and relaxation area
Health Insurance – Private health insurance, including dental care
Holidays – 5 extra days after your 1st and 5th year with us
Personal Development – Company-sponsored training and development
Employee Referral Programme – Competitive bonus for successful referrals
Social Events – Celebrating success together
Family Insurance – Add insurance coverage for a family member
Multisport Card – Fully covered sports pass

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

reliability engineeringSLIsSLOserror budgetsinfrastructure-as-codeCI/CD pipelinescloud platformsprogramming languageobservabilityAI/AIOps

Soft Skills

collaborationtechnical leadershipmentoringcross-team initiativesinfluencing engineering practices