Senior Site Reliability Engineer

Pine Software

full-time

Posted on: 1/5/2026

Location Type: Remote

Location: Ukraine

✨ AI Apply

About the role

Define, implement, and continuously monitor SLA, SLO, and SLI to measure and improve product availability and reliability.
Design, configure, and maintain monitoring and alerting systems , including Grafana, VictoriaMetrics, Alertmanager, Grafana OnCall, Kibana, and Elasticsearch; integrate new observability tools as needed.
Implement and maintain distributed observability solutions , including monitoring, tracing, and OpenTelemetry-based stacks.
Ensure stability, high availability, and reliability of infrastructure and production systems.
Participate in incident response , root cause analysis, and post-mortem reviews; drive corrective and preventive improvements.
Take part in knowledge-sharing sessions , internal trainings, and documentation efforts; contribute to mentoring and hiring processes when needed.
Participate in software release planning and collaborate with stakeholders and management on infrastructure and capacity requirements.
Collaborate with stakeholders on the design, maintenance, and regular validation of backup and disaster recovery systems.
Provide support to managers, developers, and QA engineers on monitoring, observability, and system analysis topics.

7+ years of total working experience
HashiCorp Stack: Consul, Vault, Packer, Nomad, Terraform
Configuration Management: Ansible
Google Cloud Platform (GCP): VPC, GKE, Firewall, Cloud Storage, Compute Engine, Artifact Registry
Kubernetes: GKE and on-premises solutions, Helm, Argo CD, SSO
Containerization: Working with containers (Docker / containerd / Podman); building and running containers
CI/CD: GitLab
Networking: Strong fundamentals of network architecture and protocols
Programming Languages: Bash, Python
Operating Systems: Linux (Debian-based ~85%, RHEL-based ~15%), Windows family (~5%)
Databases: PostgreSQL, MySQL, MongoDB
Caching Solutions: Redis, Memcached
Message Queues: RabbitMQ, Kafka
Load Balancing & API Gateways: HAProxy, KrakenD, Kubernetes Gateway API
Monitoring & Observability: Prometheus / VictoriaMetrics, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), OpenTelemetry, Vector, Netdata

Benefits

Care from Day One – medical insurance immediately upon starting work, including dental care, massage and professional psychological support because your well-being matters
Work-Life Balance – 25 days of paid vacation + 30 days of sick leave, so you can recover without unnecessary stress
Investment in your energy – partial reimbursement for any sports activities that empowers you.
Growth – partial coverage for English or Ukrainian language courses + a fixed budget for professional development. Choose what suits you best!
Knowledge Library – books in the office and access to the Kuka online library to learn, grow, and find inspiration.
Island Relaxation 14 days a year – enjoy a getaway at the corporate villa in Cyprus.
Office of the Future – work at Unit City, where everything is designed for productivity, even during power outages or Modern Office in Larnaca – a stylish space for inspiration: open areas, cozy lounges, and functional meeting rooms – all for your comfort.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

SLASLOSLIOpenTelemetryAnsibleTerraformKubernetesDockerCI/CDNetworking

Soft Skills

incident responseroot cause analysisknowledge-sharingmentoringcollaboration