Salary
💰 $100,000 - $130,000 per year
Tech Stack
AnsibleAzureCloudGoITSMPythonServiceNowSplunkTerraform
About the role
- Develop comprehensive solutions aligned with business objectives, considering scalability, performance, security, and cost-efficiency
- Design, administer and manage multi-tenant observability platforms to monitor customer environments across cloud, hybrid, and on-premises
- Standardize and maintain observability configurations such as dashboards and alert thresholds
- Collaborate with Service Desk, NOC, and Advanced Support Team to define and support SLAs, SLOs, KPIs
- Collaborate during customer onboarding to define alerting, dashboards, and monitoring baselines
- Continuously improve noise reduction, event correlation, and escalation processes
- Participate in incident investigations using observability data to perform root cause analysis and accelerate resolution
- Design and configure automated incident resolution using native or third-party integrations
- Ensure observability solutions align with compliance and security across customer environments
- Configure integration and optimization to ITSM platform, ServiceNow
- Assist with information gathering and reporting to clients or Client Success Managers
- Collaborate with senior engineers to escalate issues as needed
- Develop, maintain and update technical knowledge base articles
- Client interactions and after hours/weekend work may be required
- Other duties as assigned and directed
Requirements
- 3-5 years of experience in IT or MSP environment with a focus on monitoring tools
- Strong background in monitoring platforms, Logic Monitor or similar (SolarWinds, Science Logic, etc.)
- Experience using Event Management, Event Correlation and AIOps tools BigPanda or similar (Splunk, Edwin AI, etc.)
- Proficiency with automation and scripting: Python, Ansible, PowerShell, Go
- Hands-on experience with Infrastructure as Code tools like Bicep, Terraform and Ansible
- Experience with Azure concepts, Log Analytics, Azure Monitor
- Knowledge of ITIL processes (incident, problem, change management) and integration with ITSM platform, ServiceNow
- Experience navigating ticketing systems such as ServiceNow
- Familiarity with customer reporting, SLA management, and service-level dashboards
- Strong problem-solving and troubleshooting abilities
- Excellent written and verbal communication skills
- Ability to work independently and manage multiple priorities
- Strong customer service focus and ability to work collaboratively with non-technical users
- Bachelor's degree or higher education diploma in Information Technology is desired
- Certifications like Azure Administrator Associate (AZ-104), ITIL v4 Foundation, or monitoring tool certifications are highly desired