Analytic Partners

Senior Site Reliability Engineer

Analytic Partners

full-time

Posted on:

Location Type: Hybrid

Location: DallasTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Own the Internal Developer Platform (IDP) as a product, treating engineering teams as customers and optimizing for reliability, usability, and delivery velocity.
  • Define and execute a platform roadmap aligned with business priorities, developer needs, and long-term scalability.
  • Design, build, and evolve paved roads for application delivery, including CI/CD pipelines, infrastructure templates, service scaffolding, and standardized deployment patterns.
  • Build self-service capabilities that enable teams to provision, deploy, observe, and operate services with minimal friction.
  • Create and maintain reusable platform abstractions across AWS and Azure that standardize security, reliability, networking, and observability.
  • Reduce developer cognitive load by abstracting unnecessary complexity while enforcing clear guardrails for security, cost, and compliance.
  • Partner closely with application, product, and security teams to embed reliability, scalability, and security by design.
  • Establish and evolve platform standards for logging, monitoring, alerting, tracing, and incident response workloads.
  • Define, measure, and manage SLIs, SLOs, and error budgets for shared platform services.
  • Drive the reduction of operational toil through automation, standardization, and platform-first solutions.
  • Ensure shared platform services meet high standards for availability, performance, resilience, and scalability.
  • Own system-to-system integration and messaging patterns used across the platform.
  • Lead capacity planning, demand forecasting, and performance tuning for platform services.
  • Plan and execute zero-downtime upgrades, migrations, and releases of platform components.
  • Lead platform-level incident response workflows, post-incident reviews, and drive systemic improvements rather than one-off fixes.
  • Evaluate incoming platform requests and translate them into scalable, productized capabilities.
  • Mentor engineers and drive platform adoption through documentation, enablement, and technical evangelism.
  • Participate in a 24x7 on-call rotation as an escalation point for platform reliability and availability issues.
  • Operate effectively in ambiguous problem spaces, making sound architectural and product decisions with limited guidance.

Requirements

  • Bachelor’s degree in Computer Science or equivalent practical experience.
  • 6+ years of experience in Platform Engineering, Site Reliability Engineering, DevOps, or Systems Engineering roles.
  • Strong expertise in Linux and Windows operating systems.
  • Advanced automation and scripting skills using Python, Bash, and/or PowerShell.
  • Deep, hands-on experience designing and operating AWS and Azure platforms at scale.
  • Strong experience building and operating CI/CD platforms (Jenkins, GitHub Actions or equivalent).
  • Strong experience with Infrastructure as Code and configuration management (Terraform, CloudFormation, ARM, or similar).
  • Production experience with containerized and orchestration platforms such as Docker and Kubernetes.
  • In-depth experience with the HashiCorp ecosystem (Nomad, Consul, Vault).
  • Strong understanding of distributed systems, cloud-native architectures, and reliability patterns.
  • Experience designing and operating observability platforms (e.g., Splunk, Sumo Logic, or similar).
  • Familiarity with security and compliance practices, including vulnerability scanning and enterprise security tooling.
  • Strong understanding of the software delivery lifecycle, release engineering, and platform lifecycle management.
  • Experience working in Agile / DevOps environments with a strong product mindset.
  • Demonstrated ability to influence without authority, set standards, and drive adoption across teams.
  • Excellent communication skills, able to translate platform capabilities into clear developer value.
  • Strong problem-solving skills with a bias toward durable, scalable solutions over short-term fixes.
  • A mindset of continuous improvement, curiosity, and learning.
  • Comfortable supporting a global, follow-the-sun operation when needed.
Benefits
  • Regular Employee
  • Flexibility in career paths for self-development
  • Opportunities for diversity, equity, and inclusion

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Platform EngineeringSite Reliability EngineeringDevOpsSystems EngineeringLinuxWindowsPythonBashPowerShellCI/CD
Soft skills
communicationproblem-solvinginfluence without authoritycontinuous improvementcuriositylearningmentoringtechnical evangelismoperating in ambiguitycollaboration
Certifications
Bachelor’s degree in Computer Science