Site Reliability Engineer

Hewlett Packard Enterprise

. Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement.

Posted 5/8/2026full-timeRemote • 🇵🇱 PolandSeniorLead💰 PLN 154,500 - PLN 305,500 per yearWebsite

Tech Stack

Tools & technologies

AirflowAnsibleApacheAWSCassandraCloudDistributed SystemsDockerElasticSearchFluxGoKafkaKubernetesLinuxPackerPostgresPythonRedisRubySparkTerraformUnix

About the role

Key responsibilities & impact

Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement.
Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
Maintain services once they are living by measuring and monitoring availability, latency, and overall system health.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Capacity planning the growth of cloud infrastructure.
Improve operational processes such as deployments and upgrades.
Manage execution of project priorities, deadlines, and deliverables.
Be on an on-call rotation to respond to incidents that impact platform availability.
Use your on-call shift to prevent incidents from happening.
Experience in incident response, including conducting post-mortems and implementing lessons learned, enhances system reliability.

Requirements

What you’ll need

10+ years of engineering or systems experience.
Experience building and running reliable and fault-tolerant production cloud systems at scale on AWS.
Coding infrastructure automation with Terraform, Terragrunt, Packer, CI/CD, and knowing how to use configuration management systems like Ansible.
Hands-on experience with Linux/Unix operating systems internals, file systems, system tuning, administration, and networking.
Deep experience in microservice technologies, container orchestration, and continuous deployment (Kubernetes, Docker, Helm, GitOps with Flux).
Experience in designing, building, maintaining production services, and troubleshooting large-scale distributed systems.
Experience with technologies like Apache Kafka, Apache Storm, Apache Flink, Apache Airflow and Spark, Postgres, Redis, Elasticsearch, Arango, Cassandra.
Experience with observability tools and methodology (monitoring, logging, tracing, SLOs/SLIs) for detecting and diagnosing issues in advance before causing service impact or performance degradation.
Possess strong programming skills in Shell, Python, Golang and/or Ruby.
Deliver efficiently and effectively.
Strong problem-solving and debugging skills with a high sense of ownership.

Benefits

Comp & perks

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AWSTerraformTerragruntPackerCI/CDAnsibleLinuxUnixKubernetesDocker

Soft Skills

technical leadershipproblem-solvingdebuggingownershipcommunicationproject managementincident responsecollaborationprocess improvementcapacity planning