
Production Support Engineer
Exegy
full-time
Posted on:
Location Type: Hybrid
Location: Belfast • 🇬🇧 United Kingdom
Visit company websiteJob Level
JuniorMid-Level
Tech Stack
GrafanaLinuxPrometheusPythonSplunkUnix
About the role
- Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
- Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
- Manage incident response, including escalation, root cause analysis, and post-mortem reporting
- Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
- Analyze system logs, metrics, and trends to proactively identify potential risks or issues
- Execute software deployments, configuration changes, and system upgrades with minimal disruption
- Maintain and refine operational runbooks, escalation procedures, and best practices.
- Drive continuous improvement by identifying areas for process optimization and operational efficiency
- Participate in an on-call rotation to provide 24/7 support for production systems
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience
- 2+ years of experience in production support, system administration, or monitoring role
- Strong technical skills in Linux/Unix environments, with experience in troubleshooting and debugging
- Hands-on experience with monitoring tools (e.g., ITRS, Prometheus, Grafana, Splunk) and incident management platforms
- Scripting experience (e.g., Python, Bash) to automate monitoring and reporting tasks
- Excellent problem-solving and analytical skills, with the ability to work under pressure in a fast-paced environment
- Solid understanding of networking, system performance, and application monitoring concepts
- Exceptional communication and collaboration skills to coordinate with cross-functional teams effectively
Benefits
- Monitor production systems and infrastructure, ensuring uptime and performance metrics are met
- Troubleshoot, diagnose, and resolve production issues in real time, minimizing service impact
- Manage incident response, including escalation, root cause analysis, and post-mortem reporting
- Collaborate with engineering teams to develop and implement monitoring tools, alert systems, and automated recovery processes
- Analyze system logs, metrics, and trends to proactively identify potential risks or issues
- Execute software deployments, configuration changes, and system upgrades with minimal disruption
- Maintain and refine operational runbooks, escalation procedures, and best practices.
- Drive continuous improvement by identifying areas for process optimization and operational efficiency
- Participate in an on-call rotation to provide 24/7 support for production systems
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
LinuxUnixtroubleshootingdebuggingscriptingPythonBashmonitoringsystem performanceapplication monitoring
Soft skills
problem-solvinganalytical skillscommunicationcollaborationability to work under pressurecontinuous improvementprocess optimizationoperational efficiency