Exegy

Production Support Engineer

Exegy

full-time

Posted on:

Location Type: Hybrid

Location: Belfast • 🇬🇧 United Kingdom

Visit company website
AI Apply
Apply

Job Level

JuniorMid-Level

Tech Stack

GrafanaLinuxPrometheusPythonSplunkUnix

About the role

  • Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
  • Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
  • Manage incident response, including escalation, root cause analysis, and post-mortem reporting
  • Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
  • Analyze system logs, metrics, and trends to proactively identify potential risks or issues
  • Execute software deployments, configuration changes, and system upgrades with minimal disruption
  • Maintain and refine operational runbooks, escalation procedures, and best practices.
  • Drive continuous improvement by identifying areas for process optimization and operational efficiency
  • Participate in an on-call rotation to provide 24/7 support for production systems

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience
  • 2+ years of experience in production support, system administration, or monitoring role
  • Strong technical skills in Linux/Unix environments, with experience in troubleshooting and debugging
  • Hands-on experience with monitoring tools (e.g., ITRS, Prometheus, Grafana, Splunk) and incident management platforms
  • Scripting experience (e.g., Python, Bash) to automate monitoring and reporting tasks
  • Excellent problem-solving and analytical skills, with the ability to work under pressure in a fast-paced environment
  • Solid understanding of networking, system performance, and application monitoring concepts
  • Exceptional communication and collaboration skills to coordinate with cross-functional teams effectively
Benefits
  • Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
  • Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
  • Manage incident response, including escalation, root cause analysis, and post-mortem reporting
  • Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
  • Analyze system logs, metrics, and trends to proactively identify potential risks or issues
  • Execute software deployments, configuration changes, and system upgrades with minimal disruption
  • Maintain and refine operational runbooks, escalation procedures, and best practices.
  • Drive continuous improvement by identifying areas for process optimization and operational efficiency
  • Participate in an on-call rotation to provide 24/7 support for production systems

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
LinuxUnixtroubleshootingdebuggingscriptingPythonBashmonitoringsystem performanceapplication monitoring
Soft skills
problem-solvinganalytical skillscommunicationcollaborationability to work under pressurecontinuous improvementprocess optimizationoperational efficiency