qode.world

Senior Consultant – SRE Architect

qode.world

full-time

Posted on:

Location Type: Hybrid

Location: AustinTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
  • Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
  • Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
  • Design and implement dependency mapping and service topology visualization across complex ecosystems
  • Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
  • Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
  • Partner with application and database teams to optimize system performance and scalability
  • Drive adoption of performance engineering best practices across teams
  • Define and implement resiliency strategies for business-critical transaction flows
  • Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
  • Lead and design chaos engineering initiatives to validate system resilience
  • Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes
  • Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
  • Define and enforce standards, policies, and governance models for monitoring and tracing
  • Lead cross-functional initiatives to drive adoption of observability frameworks
  • Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence
  • Drive measurable improvements including:
  • 30% reduction in MTTD and MTTR within the first year
  • ≥70% root cause identification within 1 hour
  • ≥90% proactive issue detection via monitoring systems
  • Develop executive-level reporting on system health, reliability trends, and performance metrics
  • Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption
  • Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
  • Develop and standardize incident response playbooks and runbooks
  • Lead training and enablement initiatives to scale observability expertise across teams

Requirements

  • 10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction
  • Dynatrace Associate or Professional Certification
  • Experience implementing OpenTelemetry standards at scale
  • Strong background in chaos engineering and resiliency testing
  • Familiarity with AIOps platforms and intelligent automation solutions
  • Consulting experience or prior role as an architect / technical advisor
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
observabilityarchitecturedistributed systemsmicroservicesprogrammingautomationchaos engineeringresiliency testingincident managementmonitoring
Soft Skills
leadershipcross-functional collaborationmentoringinfluencingcontinuous improvementoperational excellencecommunicationstrategic thinkingproblem-solvingadvisory
Certifications
Dynatrace Associate CertificationDynatrace Professional Certification