
Senior Site Reliability Engineer
Pine Software
full-time
Posted on:
Location Type: Remote
Location: Ukraine
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Define, implement, and continuously monitor SLA, SLO, and SLI to measure and improve product availability and reliability.
- Design, configure, and maintain monitoring and alerting systems , including Grafana, VictoriaMetrics, Alertmanager, Grafana OnCall, Kibana, and Elasticsearch; integrate new observability tools as needed.
- Implement and maintain distributed observability solutions , including monitoring, tracing, and OpenTelemetry-based stacks.
- Ensure stability, high availability, and reliability of infrastructure and production systems.
- Participate in incident response , root cause analysis, and post-mortem reviews; drive corrective and preventive improvements.
- Take part in knowledge-sharing sessions , internal trainings, and documentation efforts; contribute to mentoring and hiring processes when needed.
- Participate in software release planning and collaborate with stakeholders and management on infrastructure and capacity requirements.
- Collaborate with stakeholders on the design, maintenance, and regular validation of backup and disaster recovery systems.
- Provide support to managers, developers, and QA engineers on monitoring, observability, and system analysis topics.
Requirements
- 7+ years of total working experience
- HashiCorp Stack: Consul, Vault, Packer, Nomad, Terraform
- Configuration Management: Ansible
- Google Cloud Platform (GCP): VPC, GKE, Firewall, Cloud Storage, Compute Engine, Artifact Registry
- Kubernetes: GKE and on-premises solutions, Helm, Argo CD, SSO
- Containerization: Working with containers (Docker / containerd / Podman); building and running containers
- CI/CD: GitLab
- Networking: Strong fundamentals of network architecture and protocols
- Programming Languages: Bash, Python
- Operating Systems: Linux (Debian-based ~85%, RHEL-based ~15%), Windows family (~5%)
- Databases: PostgreSQL, MySQL, MongoDB
- Caching Solutions: Redis, Memcached
- Message Queues: RabbitMQ, Kafka
- Load Balancing & API Gateways: HAProxy, KrakenD, Kubernetes Gateway API
- Monitoring & Observability: Prometheus / VictoriaMetrics, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), OpenTelemetry, Vector, Netdata
Benefits
- Care from Day One – medical insurance immediately upon starting work, including dental care, massage and professional psychological support because your well-being matters
- Work-Life Balance – 25 days of paid vacation + 30 days of sick leave, so you can recover without unnecessary stress
- Investment in your energy – partial reimbursement for any sports activities that empowers you.
- Growth – partial coverage for English or Ukrainian language courses + a fixed budget for professional development. Choose what suits you best!
- Knowledge Library – books in the office and access to the Kuka online library to learn, grow, and find inspiration.
- Island Relaxation 14 days a year – enjoy a getaway at the corporate villa in Cyprus.
- Office of the Future – work at Unit City, where everything is designed for productivity, even during power outages or Modern Office in Larnaca – a stylish space for inspiration: open areas, cozy lounges, and functional meeting rooms – all for your comfort.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
SLASLOSLIOpenTelemetryAnsibleTerraformKubernetesDockerCI/CDNetworking
Soft skills
incident responseroot cause analysisknowledge-sharingmentoringcollaboration