Hewlett Packard Enterprise

Principal Site Reliability Engineer

Hewlett Packard Enterprise

full-time

Posted on:

Location Type: Hybrid

Location: San JuanPuerto Rico

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Enhance Infrastructure as Code (IAC) and enforce best practices.
  • Optimize cloud infrastructure for scalability, security, and cost-effectiveness.
  • Develop internal tools to support and streamline cloud platform operations.
  • Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins.
  • Address container image vulnerabilities and standardize remediation processes.
  • Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks.
  • Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools.
  • Troubleshoot complex production issues to ensure system reliability and customer satisfaction.
  • Fine-tune distributed systems such as Apache Kafka and Cassandra.
  • Collaborate with development, security, and operations teams to align infrastructure with application needs.

Requirements

  • Minimum of 10 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE).
  • Proficiency with Linux systems, especially Debian-based distributions.
  • Strong experience with cloud platforms such as AWS and GCP.
  • Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible.
  • Solid programming skills in Python and/or Golang.
  • Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE).
  • Experience with GitOps workflows.
  • Proven track record in implementing and maintaining CI/CD pipelines.
  • Strong background in security and familiarity with security programs.
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK).
  • Knowledge of both relational (SQL) and non-relational databases.
  • Excellent problem-solving and debugging skills with a strong sense of ownership.
  • Experience managing distributed systems like Apache Kafka and Cassandra.
  • Effective communicator and collaborative team player.
Benefits
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Infrastructure as Codecloud infrastructure optimizationCI/CD pipelinesFluxCDJenkinscontainerizationDockerAWS EKSGCP GKEprogramming in Python
Soft Skills
problem-solvingdebuggingownershipcommunicationcollaboration