Develop and create monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and alerting platforms
Ensure the reliability, performance, and scalability of our Microsoft 365 environment
Identify problems with systems and services
Drive regular deployment of new versions of the systems and their subcomponents
Lead projects focused on building and maintaining observability/monitoring for the application
Maintain alerting and continuously improving visibility
Drive decisions around system validation, testing, and service monitoring
Implement approved proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability
Investigate and document issues and lead internal teams to develop solutions to mitigate them
Conduct post-incident reviews and document findings for future informed decision making
Coach and mentor team members
Requirements
A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science)
5 – 7 years of related experience
5-7 or more years of experience in site reliability engineering, with a focus on Microsoft 365
In-depth knowledge of Microsoft 365 services, architecture, and administration
Advanced skills in writing and debugging PowerShell scripts for automation and administration tasks
Advanced proficiency in utilizing Graph APIs for integration and automation
Intermediate skills in creating and managing Power Apps/Automate workflows and applications
Experience in developing and creating monitoring dashboards in Splunk, Dynatrace, and other monitoring platforms
Strong understanding of incident management processes and tools
Benefits
competitive pay
health insurance
401K and stock purchase plans
tuition reimbursement
paid time off plus holidays
flexible approach to work with remote, hybrid, field or office work schedules
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
site reliability engineeringMicrosoft 365PowerShell scriptingGraph APIsPower AppsPower Automatemonitoring dashboardsSplunkDynatraceSoftware Development Life Cycle (SDLC)
Soft skills
problem identificationproject leadershipcoachingmentoringincident managementdocumentationdecision makingteam collaborationcommunicationcontinuous improvement