Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
qode.world

Senior Consultant – SRE Architect

qode.world

Senior Consultant / SRE Architect leading enterprise-wide observability strategies and reliability frameworks for business-critical transactions at Incedo.

Posted 4/15/2026full-timeAustin • Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformGrafanaJavaMicroservicesPythonSplunk

About the role

Key responsibilities & impact
  • Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
  • Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
  • Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
  • Design and implement dependency mapping and service topology visualization across complex ecosystems
  • Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
  • Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
  • Partner with application and database teams to optimize system performance and scalability
  • Drive adoption of performance engineering best practices across teams
  • Define and implement resiliency strategies for business-critical transaction flows
  • Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
  • Lead and design chaos engineering initiatives to validate system resilience
  • Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
  • Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
  • Define and enforce standards, policies, and governance models for monitoring and tracing
  • Lead cross-functional initiatives to drive adoption of observability frameworks
  • Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
  • Drive measurable improvements including:
  • 30% reduction in MTTD and MTTR within the first year
  • ≥70% root cause identification within 1 hour
  • ≥90% proactive issue detection via monitoring systems
  • Develop executive-level reporting on system health, reliability trends, and performance metrics
  • Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
  • Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
  • Develop and standardize incident response playbooks and runbooks
  • Lead training and enablement initiatives to scale observability expertise across teams

Requirements

What you’ll need
  • 10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction
  • Dynatrace Associate or Professional Certification
  • Experience implementing OpenTelemetry standards at scale
  • Strong background in chaos engineering and resiliency testing
  • Familiarity with AIOps platforms and intelligent automation solutions
  • Consulting experience or prior role as an architect / technical advisor

Benefits

Comp & perks
  • 🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account qode.world Website LinkedIn All Job Openings 11 - 50 employees 🤖 Artificial Intelligence 👥 HR Tech 🎯 Recruiter Artificial Intelligence
  • HR Tech
  • Recruitment qode. world is a company that leverages artificial intelligence to revolutionize the recruiting process. Their platform allows users to find candidates by sourcing data from billions of data points worldwide and provides data-driven insights. Users can connect with candidates directly through the platform, conduct customized AI-led interviews, and get comprehensive assessments. The service also integrates easily with LinkedIn, enhancing the talent pool and facilitating direct communication with candidates listed there. Qode. world offers additional recruiting services to assist in hiring for niche or senior roles. They are praised for their effectiveness in streamlining the hiring process and delivering quick results. Senior Consultant – SRE Architect 🔥 54 minutes ago 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud Distributed Systems Google Cloud Platform Grafana Java Microservices Python Splunk Go Apply Now Find Hiring Managers Customize resume for this job Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
  • Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
  • Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
  • Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
  • Design and implement dependency mapping and service topology visualization across complex ecosystems
  • Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
  • Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
  • Partner with application and database teams to optimize system performance and scalability
  • Drive adoption of performance engineering best practices across teams
  • Define and implement resiliency strategies for business-critical transaction flows
  • Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
  • Lead and design chaos engineering initiatives to validate system resilience
  • Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
  • Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
  • Define and enforce standards, policies, and governance models for monitoring and tracing
  • Lead cross-functional initiatives to drive adoption of observability frameworks
  • Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
  • Drive measurable improvements including:
  • 30% reduction in MTTD and MTTR within the first year
  • ≥70% root cause identification within 1 hour
  • ≥90% proactive issue detection via monitoring systems
  • Develop executive-level reporting on system health, reliability trends, and performance metrics
  • Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
  • Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
  • Develop and standardize incident response playbooks and runbooks
  • Lead training and enablement initiatives to scale observability expertise across teams 🎯 Requirements
  • 10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction
  • Dynatrace Associate or Professional Certification
  • Experience implementing OpenTelemetry standards at scale
  • Strong background in chaos engineering and resiliency testing
  • Familiarity with AIOps platforms and intelligent automation solutions
  • Consulting experience or prior role as an architect / technical advisor Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score Similar Jobs Senior DevOps Engineer – FedRAMP 🔥 16 hours ago Semperis 201 - 500 🔒 Cybersecurity 🏢 Enterprise ☁️ SaaS Website LinkedIn All Job Openings Senior DevOps Engineer responsible for deployment and secure operations of FedRAMP products at Semperis. Focusing on compliance, automation, and collaborating with security teams. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Azure Cloud Grafana Prometheus Terraform DevOps Team Lead – FedRAMP 🔥 16 hours ago Semperis 201 - 500 🔒 Cybersecurity 🏢 Enterprise ☁️ SaaS Website LinkedIn All Job Openings DevOps Team Lead managing deployment and operations of FedRAMP authorized products at Semperis. Lead a team in a regulated environment focusing on security and process improvement. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Azure Cloud Grafana Prometheus Terraform DevOps Engineer 🕒 3 days ago Teza Technologies 51 - 200 💸 Finance 💳 Fintech Website LinkedIn All Job Openings DevOps Engineer overseeing and evolving our infrastructure platform for a systematic trading firm. Shaping tooling choices and establishing standards while maintaining a strong security posture. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud DNS Docker Google Cloud Platform Kubernetes Linux Python SMTP Manager, DevOps 🕒 4 days ago Seekr Technologies 51 - 200 🤖 Artificial Intelligence 🏢 Enterprise 🏛️ Government Website LinkedIn All Job Openings DevOps Manager overseeing scaling for Seekr's AI platform using Kubernetes, Terraform, and Ansible. Leading a hands-on team and collaborating with engineering for efficiency. 🏢🏡 Austin – Hybrid ⏰ Full Time 🟠 Senior 🔴 Lead ⛑ DevOps & Site Reliability Engineer (SRE) Ansible AWS Azure Cloud Distributed Systems Docker ElasticSearch Firewalls Grafana Kubernetes Linux Postgres Prometheus Python Terraform Go DevOps Engineer, Falcon NG-SIEM, FrontTier Expansions 🕒 6 days ago CrowdStrike 5001 - 10000 🔒 Cybersecurity ☁️ SaaS 🤖 Artificial Intelligence Website LinkedIn All Job Openings DevOps Engineer building and expanding CrowdStrike's global infrastructure across multiple cloud platforms. Ensuring reliable and scalable LogScale operations with a focus on operational excellence. 🏢🏡 Austin – Hybrid 💵 $120k - $180k / year ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Ansible AWS Chef Cloud Distributed Systems DNS Google Cloud Platform Jenkins Kubernetes Linux Microservices Python Go View More DevOps Jobs 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
observabilityarchitecturedistributed systemsmicroservicesprogrammingautomationchaos engineeringresiliency testingincident managementmonitoring
Soft Skills
leadershipcross-functional collaborationmentoringinfluencingcontinuous improvementoperational excellencecommunicationstrategic thinkingproblem-solvingadvisory
Certifications
Dynatrace Associate CertificationDynatrace Professional Certification