Lead the design and enhancement of operational monitoring frameworks, ensuring high availability and stability for all critical systems.
Architect and manage complex on-premise and/or cloud environments, focusing on resilience, security, and compliance.
Serve as a mentor to SysOps Engineers, fostering a culture of continuous improvement and best practices.
Tackle complex system performance issues and lead incident management, employing advanced analytical skills to restore and maintain service reliability and efficiency.
Requirements
Bachelor’s degree in Computer Science, Engineering, IT, or related field; or equivalent practical experience.
5+ years of experience in System Operations or Systems Administration, with proficient scripting skills in Bash, PowerShell, or Python.
Deep expertise in monitoring, incident response, and troubleshooting systems on both cloud platforms (AWS, Azure, Google Cloud) and on-premise infrastructure.
Advanced knowledge in networking, security protocols, backup/recovery strategies, and database management.
Leadership skills with the ability to mentor others, influence cross-functional teams, and lead by example in a collaborative environment.
Strong collaborative skills, with the ability to work effectively with cross-functional teams to foster an environment of teamwork and cooperation.
Exceptional English communication skills with team members, stakeholders, and customers, ensuring clear and effective exchange of information.
Strong analytical and problem-solving skills, with a detail-oriented approach to identifying and resolving system issues efficiently.
Self-motivated and detail-oriented, with the ability to work independently and under pressure, managing multiple priorities and deadlines effectively.
Benefits
Flexible work environment
Generous time off
Mental health plans (country-dependent)
Fitness offerings
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
System OperationsSystems AdministrationBashPowerShellPythonmonitoringincident responsetroubleshootingnetworkingdatabase management