RapidSOS

Network Operations Center (NOC) Engineer I

RapidSOS

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $78,000 - $85,000 per year

About the role

  • Monitor Production and Enterprise Infrastructure and react to alarms according to documented SLAs
  • Work with Engineering and Customer Support teams to remediate alarms and incidents
  • Continually strive to improve the environment through optimization and automation
  • Create and update documentation as necessary to share new methods and knowledge around troubleshooting
  • Perform operational tasks as assigned by Engineering and Customer Support teams
  • Support incident response, deployments, and infrastructure training as the role evolves
  • Work with international teams to diagnose and resolve critical issues
  • Build, tune, and maintain alerting rules and monitors to ensure every alert is actionable, including investigating root cause, not just symptom mitigation
  • Participate in post-incident reviews and contribute to blameless post-mortems

Requirements

  • 2+ years of experience in a help desk environment or NOC role, ideally in a cloud-based environment
  • Experience managing and creating alerts and monitors using enterprise monitoring tools such as Nagios, Zabbix, SolarWinds and Datadog (Datadog preferred)
  • Experience with Incident Management platforms such as Pagerduty, Opsgenie or Firehydrant
  • Experience working with ticketing systems such as Jira and Zendesk
  • Experience following runbooks and troubleshooting guides to remediate infrastructure or application issues
  • Experience with Infrastructure operations (Cloud Infrastructure AWS/Azure preferred)
  • Technical aptitude with the ability & willingness to quickly learn and understand complex products or services
  • Highly self-motivated, strong work ethic and ability to multitask in a fast-paced environment
  • Demonstrates experience in adept problem-solving abilities, and organizational skills, ensuring successful outcomes and efficient execution of incident response and initiatives
  • Strong written and verbal communication skills in English
  • Ability to work flexible shifts and participate in a 24x7 on-call rotation
  • Experience building log-based alert rules (e.g., ElastAlert or equivalent) and investigating issues using centralized logging platforms (e.g., ELK/Kibana or equivalent)
  • Comfort with Kubernetes and Docker container-based environments, including pod-level health triage
  • Comfort working in a command-line environment (Linux/bash, Windows CMD/PowerShell, or equivalent) the team regularly uses CLI tools for infrastructure triage, pod inspection, and operational scripts
Benefits
  • Competitive salary and benefits and equity participation
  • A dynamic, flexible and fun start-up work environment with a highly talented team
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementalert managementmonitoring toolstroubleshootinginfrastructure operationslog-based alert rulesKubernetesDockercommand-line environmentcloud infrastructure
Soft Skills
problem-solvingorganizational skillscommunication skillsself-motivatedmultitaskingadaptabilityteam collaborationfast-paced environmentattention to detailwillingness to learn