SRT Marine Systems plc

System Monitoring & Observability Engineer, Prometheus, Grafana

SRT Marine Systems plc

contract

Posted on:

Location Type: Hybrid

Location: CardiffUnited Kingdom

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design, configure, and maintain Prometheus-based monitoring solutions
  • Develop and manage metric exporters for application and system-level data
  • Optimise Prometheus scraping configurations and retention policies
  • Define and maintain alert rules based on SLIs/SLOs and performance baselines
  • Ensure alerts are actionable, with minimal false positives
  • Participate in on-call rotations and incident postmortems
  • Design and maintain Grafana dashboards for real-time operational insights
  • Collaborate with engineering and product teams to create tailored visualisations
  • Provide self-service dashboard capabilities for end users
  • Monitor infrastructure for uptime, latency, and throughput
  • Identify bottlenecks and recommend improvements

Requirements

  • Proven experience with Prometheus (including PromQL) and Grafana in production environments
  • Strong knowledge of Linux-based systems
  • Experience writing and optimising PromQL queries for alerts and dashboards
  • Familiarity with exporters (node_exporter, blackbox_exporter, custom exporters)
  • Understanding of alertmanager configuration and routing
  • Proficiency with Grafana dashboard creation and templating
  • Strong troubleshooting skills for infrastructure and application issues
  • Familiarity with containers (Docker)
  • Scripting skills (Bash, Python, or Go) for automation
Benefits
  • Highly Competitive Salary
  • Matched company pension contributions up to 5%
  • 25 days annual leave rising to 28 days with service
  • Career development opportunities
  • Company “Get to know you” days
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PrometheusPromQLGrafanaLinuxBashPythonGoDockermetric exportersalertmanager
Soft Skills
troubleshootingcollaborationincident managementproblem-solvingcommunication