The DOC / IT Ops Engineer with experience in Incident Management, Problem Management, Change Management, Knowledge Management and Proactive Monitoring areas
Lead efforts involving the monitoring of key internal services and incident teams, as well as escalation and alerts to our crisis team(s)
Champion best practices of the DOC Playbook through the entire DOC team
Guide the implementation, continuous improvement and documentation of new and existing policies, procedures and processes for the DOC
Technology innovation mindset with AI Operations and self-healing scripts
Develop, configure, and manage alerting systems to promptly identify and alert relevant parties to emerging issues
Facilitate crisis team assessments or activations through rapid situational awareness to incident coordinators
Continuously monitor internal IT and product-related incidents using various tools and platforms
Maintain clear and concise communication during incidents, providing regular updates to stakeholders
Collaborate with cross-functional teams to support root cause analysis of complex issues
Document and maintain standard operating procedures for DOC response and escalation processes
Coordinate and escalate issues to appropriate teams and stakeholders as needed
Participate in post-incident reviews and contribute to the development of preventive measures
Stay up-to-date with industry trends and best practices in all-hazards operations center operations and technologies
Identify opportunities for improving monitoring and alerting systems and processes
Generate regular reports on tracked incidents, assessments, and status
Provide insights and recommendations based on incident analysis and trends
Maintain detailed and accurate incident logs and documentation.
Requirements
Bachelor’s degree in Engineering, Computer Science, Information Technology, or a related field
Minimum of 6 years of experience in Digital Operations Center (DOC) or similar environment like NOC or IT Operations
Proven experience with maintaining a common operating picture with existing monitoring tools and situational awareness dashboards
Strong understanding of network protocols, systems, and infrastructure
Strong communication and interpersonal skills
Ability to work effectively under pressure and manage multiple priorities
Proficiency in scripting languages (e.g., Python, Bash), AI Ops is a plus
Familiarity with ITIL practices and frameworks is desirable
Familiarity with incident command system (ICS) principles and best practices in the technology industry
Excellent problem-solving and analytical skills
Relevant certifications such as CCNA, CCNP, or equivalent are preferred
Ability to work in a high-stress, fast-paced environment.
Benefits
Remote-friendly and flexible work culture
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Communication skillsInterpersonal skillsAbility to work under pressureTime managementProblem-solving skillsAnalytical skillsCollaborationLeadershipCrisis managementContinuous improvement
Certifications
Bachelor’s degree in EngineeringBachelor’s degree in Computer ScienceBachelor’s degree in Information TechnologyCCNACCNP