
Network Operations Center (NOC) Engineer I
RapidSOS
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $78,000 - $85,000 per year
About the role
- Monitor Production and Enterprise Infrastructure and react to alarms according to documented SLAs
- Work with Engineering and Customer Support teams to remediate alarms and incidents
- Continually strive to improve the environment through optimization and automation
- Create and update documentation as necessary to share new methods and knowledge around troubleshooting
- Perform operational tasks as assigned by Engineering and Customer Support teams
- Support incident response, deployments, and infrastructure training as the role evolves
- Work with international teams to diagnose and resolve critical issues
- Build, tune, and maintain alerting rules and monitors to ensure every alert is actionable, including investigating root cause, not just symptom mitigation
- Participate in post-incident reviews and contribute to blameless post-mortems
Requirements
- 2+ years of experience in a help desk environment or NOC role, ideally in a cloud-based environment
- Experience managing and creating alerts and monitors using enterprise monitoring tools such as Nagios, Zabbix, SolarWinds and Datadog (Datadog preferred)
- Experience with Incident Management platforms such as Pagerduty, Opsgenie or Firehydrant
- Experience working with ticketing systems such as Jira and Zendesk
- Experience following runbooks and troubleshooting guides to remediate infrastructure or application issues
- Experience with Infrastructure operations (Cloud Infrastructure AWS/Azure preferred)
- Technical aptitude with the ability & willingness to quickly learn and understand complex products or services
- Highly self-motivated, strong work ethic and ability to multitask in a fast-paced environment
- Demonstrates experience in adept problem-solving abilities, and organizational skills, ensuring successful outcomes and efficient execution of incident response and initiatives
- Strong written and verbal communication skills in English
- Ability to work flexible shifts and participate in a 24x7 on-call rotation
- Experience building log-based alert rules (e.g., ElastAlert or equivalent) and investigating issues using centralized logging platforms (e.g., ELK/Kibana or equivalent)
- Comfort with Kubernetes and Docker container-based environments, including pod-level health triage
- Comfort working in a command-line environment (Linux/bash, Windows CMD/PowerShell, or equivalent) the team regularly uses CLI tools for infrastructure triage, pod inspection, and operational scripts
Benefits
- Competitive salary and benefits and equity participation
- A dynamic, flexible and fun start-up work environment with a highly talented team
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
incident managementalert managementmonitoring toolstroubleshootinginfrastructure operationslog-based alert rulesKubernetesDockercommand-line environmentcloud infrastructure
Soft Skills
problem-solvingorganizational skillscommunication skillsself-motivatedmultitaskingadaptabilityteam collaborationfast-paced environmentattention to detailwillingness to learn