Growe

Technical Support L2

Growe

full-time

Posted on:

Location Type: Remote

Location: Remote • 🌎 Anywhere in the World

Visit company website
AI Apply
Apply

Job Level

Junior

Tech Stack

AWSCloudEC2GrafanaMicroservicesPrometheus

About the role

  • Respond to production incidents within defined SLAs and provide rapid problem identification and initial resolution;
  • Use monitoring instruments (Grafana, VictoriaMetrics/Prometheus) to identify and diagnose issues in microservices architecture;
  • Use logging tools (OpenSearch/Kibana) for comprehensive log analysis and root cause investigation;
  • Monitor and respond to alerts from PagerDuty or Grafana On-call, ensuring proper escalation and communication;
  • Escalate complex issues to appropriate specialized teams (DevOps, SystemRE, PlatformRE) with clear context and documentation;
  • Create, maintain, and update runbooks, troubleshooting guides, and incident documentation;
  • Provide clear, timely communication during incidents to stakeholders, development teams, and management;
  • Contribute to continuous improvement of incident response processes and tool utilization;
  • Participate in on-call rotations, ensuring timely response to critical incidents and proper handoff procedures;
  • Provide operational support and guidance to development teams regarding system reliability and performance.

Requirements

  • 1-3 years in L2 support, Site Reliability Engineering or technical support, or related role with incident response experience;
  • Hands-on experience with Grafana dashboards, VictoriaMetrics, Prometheus, and metrics exporters for system health monitoring and performance analysis;
  • Proficiency with OpenSearch (Kibana web interface), log aggregation, search queries, and log analysis for troubleshooting and root cause investigation;
  • Experience with PagerDuty, Grafana On-call, or similar alerting systems for incident response, escalation procedures, and on-call operations;
  • Strong analytical skills for identifying issues using provided monitoring tools, dashboards, and alerting systems;
  • Clear written and verbal communication skills for incident reporting, stakeholder updates, and creating/updating runbooks and troubleshooting guides;
  • Understanding of cloud concepts and familiarity with AWS services (EC2, EKS, RDS, S3) for context in incident response and escalation;
  • Systematic approach to problem-solving, ability to follow runbooks, and experience with incident response procedures;
  • Ability to quickly learn and effectively use monitoring, logging, and alerting tools provided by DevOps/SystemRE/PlatformRE teams.
Benefits
  • Respond to production incidents within defined SLAs and provide rapid problem identification and initial resolution;
  • Use monitoring instruments (Grafana, VictoriaMetrics/Prometheus) to identify and diagnose issues in microservices architecture;
  • Use logging tools (OpenSearch/Kibana) for comprehensive log analysis and root cause investigation;
  • Monitor and respond to alerts from PagerDuty or Grafana On-call, ensuring proper escalation and communication;
  • Escalate complex issues to appropriate specialized teams (DevOps, SystemRE, PlatformRE) with clear context and documentation;
  • Create, maintain, and update runbooks, troubleshooting guides, and incident documentation;
  • Provide clear, timely communication during incidents to stakeholders, development teams, and management;
  • Contribute to continuous improvement of incident response processes and tool utilization;
  • Participate in on-call rotations, ensuring timely response to critical incidents and proper handoff procedures;
  • Provide operational support and guidance to development teams regarding system reliability and performance.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
L2 supportSite Reliability Engineeringincident responsemonitoring toolslog analysistroubleshootingproblem-solvingcloud conceptsAWSmetrics analysis
Soft skills
analytical skillsclear communicationwritten communicationverbal communicationstakeholder updatesdocumentationcontinuous improvementtimely responseteam collaborationsystem reliability guidance
GR8 Tech

L1 Technical Support Specialist

GR8 Tech
Juniorfull-time🌎 Anywhere in the World
Posted: 8 days agoSource: boards.greenhouse.io
Grafana