Senior Consultant – SRE Architect

qode.world

Senior Consultant / SRE Architect leading enterprise-wide observability strategies and reliability frameworks for business-critical transactions at Incedo.

Posted 4/15/2026full-timeAustin • Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies

AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaJavaMicroservicesPythonSplunk

About the role

Key responsibilities & impact

Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
Design and implement dependency mapping and service topology visualization across complex ecosystems
Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
Partner with application and database teams to optimize system performance and scalability
Drive adoption of performance engineering best practices across teams
Define and implement resiliency strategies for business-critical transaction flows
Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
Lead and design chaos engineering initiatives to validate system resilience
Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
Define and enforce standards, policies, and governance models for monitoring and tracing
Lead cross-functional initiatives to drive adoption of observability frameworks
Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
Drive measurable improvements including:
30% reduction in MTTD and MTTR within the first year
≥70% root cause identification within 1 hour
≥90% proactive issue detection via monitoring systems
Develop executive-level reporting on system health, reliability trends, and performance metrics
Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
Develop and standardize incident response playbooks and runbooks
Lead training and enablement initiatives to scale observability expertise across teams

Requirements

What you’ll need

10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
Strong understanding of microservices architecture, APIs, and distributed systems
Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
Demonstrated ability to lead cross-functional initiatives and influence technical direction
Dynatrace Associate or Professional Certification
Experience implementing OpenTelemetry standards at scale
Strong background in chaos engineering and resiliency testing
Familiarity with AIOps platforms and intelligent automation solutions
Consulting experience or prior role as an architect / technical advisor

Benefits

Comp & perks

🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account qode.world Website LinkedIn All Job Openings 11 - 50 employees 🤖 Artificial Intelligence 👥 HR Tech 🎯 Recruiter Artificial Intelligence
HR Tech
Recruitment qode. world is a company that leverages artificial intelligence to revolutionize the recruiting process. Their platform allows users to find candidates by sourcing data from billions of data points worldwide and provides data-driven insights. Users can connect with candidates directly through the platform, conduct customized AI-led interviews, and get comprehensive assessments. The service also integrates easily with LinkedIn, enhancing the talent pool and facilitating direct communication with candidates listed there. Qode. world offers additional recruiting services to assist in hiring for niche or senior roles. They are praised for their effectiveness in streamlining the hiring process and delivering quick results. Senior Consultant – SRE Architect 🔥 54 minutes ago 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud Distributed Systems Google Cloud Platform Grafana Java Microservices Python Splunk Go Apply Now Find Hiring Managers Customize resume for this job Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
Design and implement dependency mapping and service topology visualization across complex ecosystems
Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
Partner with application and database teams to optimize system performance and scalability
Drive adoption of performance engineering best practices across teams
Define and implement resiliency strategies for business-critical transaction flows
Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
Lead and design chaos engineering initiatives to validate system resilience
Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
Define and enforce standards, policies, and governance models for monitoring and tracing
Lead cross-functional initiatives to drive adoption of observability frameworks
Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
Drive measurable improvements including:
30% reduction in MTTD and MTTR within the first year
≥70% root cause identification within 1 hour
≥90% proactive issue detection via monitoring systems
Develop executive-level reporting on system health, reliability trends, and performance metrics
Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
Develop and standardize incident response playbooks and runbooks
Lead training and enablement initiatives to scale observability expertise across teams 🎯 Requirements
10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
Strong understanding of microservices architecture, APIs, and distributed systems
Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
Demonstrated ability to lead cross-functional initiatives and influence technical direction
Dynatrace Associate or Professional Certification
Experience implementing OpenTelemetry standards at scale
Strong background in chaos engineering and resiliency testing
Familiarity with AIOps platforms and intelligent automation solutions
Consulting experience or prior role as an architect / technical advisor Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score Similar Jobs Senior DevOps Engineer – FedRAMP 🔥 16 hours ago Semperis 201 - 500 🔒 Cybersecurity 🏢 Enterprise ☁️ SaaS Website LinkedIn All Job Openings Senior DevOps Engineer responsible for deployment and secure operations of FedRAMP products at Semperis. Focusing on compliance, automation, and collaborating with security teams. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Azure Cloud Grafana Prometheus Terraform DevOps Team Lead – FedRAMP 🔥 16 hours ago Semperis 201 - 500 🔒 Cybersecurity 🏢 Enterprise ☁️ SaaS Website LinkedIn All Job Openings DevOps Team Lead managing deployment and operations of FedRAMP authorized products at Semperis. Lead a team in a regulated environment focusing on security and process improvement. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Azure Cloud Grafana Prometheus Terraform DevOps Engineer 🕒 3 days ago Teza Technologies 51 - 200 💸 Finance 💳 Fintech Website LinkedIn All Job Openings DevOps Engineer overseeing and evolving our infrastructure platform for a systematic trading firm. Shaping tooling choices and establishing standards while maintaining a strong security posture. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud DNS Docker Google Cloud Platform Kubernetes Linux Python SMTP Manager, DevOps 🕒 4 days ago Seekr Technologies 51 - 200 🤖 Artificial Intelligence 🏢 Enterprise 🏛️ Government Website LinkedIn All Job Openings DevOps Manager overseeing scaling for Seekr's AI platform using Kubernetes, Terraform, and Ansible. Leading a hands-on team and collaborating with engineering for efficiency. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior 🔴 Lead ⛑ DevOps & Site Reliability Engineer (SRE) Ansible AWS Azure Cloud Distributed Systems Docker ElasticSearch Firewalls Grafana Kubernetes Linux Postgres Prometheus Python Terraform Go DevOps Engineer, Falcon NG-SIEM, FrontTier Expansions 🕒 6 days ago CrowdStrike 5001 - 10000 🔒 Cybersecurity ☁️ SaaS 🤖 Artificial Intelligence Website LinkedIn All Job Openings DevOps Engineer building and expanding CrowdStrike's global infrastructure across multiple cloud platforms. Ensuring reliable and scalable LogScale operations with a focus on operational excellence. 🏢🏡 Austin – Hybrid 💵 $120k - $180k / year ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Ansible AWS Chef Cloud Distributed Systems DNS Google Cloud Platform Jenkins Kubernetes Linux Microservices Python Go View More DevOps Jobs 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

observabilityarchitecturedistributed systemsmicroservicesprogrammingautomationchaos engineeringresiliency testingincident managementmonitoring

Soft Skills

leadershipcross-functional collaborationmentoringinfluencingcontinuous improvementoperational excellencecommunicationstrategic thinkingproblem-solvingadvisory

Certifications

Dynatrace Associate CertificationDynatrace Professional Certification