Salary
💰 $150,000 - $210,000 per year
Tech Stack
AEMAnsibleAWSAzureCloudDockerJavaScriptKubernetesTerraform
About the role
- Lead the day-to-day operations of our SaaS cloud infrastructure, ensuring high availability and reliability
- Oversee and mentor a team of Systems Engineers, DBAs, and NOC staff, ensuring effective incident management, troubleshooting, and operational improvements
- Manage monitoring and observability tools to identify and resolve system bottlenecks, improving performance and uptime
- Optimize AWS and Azure cloud environments, ensuring scalability and cost efficiency while meeting service level objectives
- Supervise the 24x7 Network Operations Center (NOC) to ensure proactive response to infrastructure issues
- Ensure secure cloud operations by implementing and enforcing security policies aligned with SOC 2 controls, ISO 27001 and other compliance standards
- Lead incident response efforts, triaging critical issues and working with engineering teams to resolve production-impacting events
- Manage CI/CD pipelines, configuration management, and infrastructure automation to support rapid software delivery
- Collaborate with cross-functional teams to support software development, security, and IT infrastructure initiatives
- Develop and maintain audit-ready documentation for security compliance and operational excellence
- Analyze and optimize cloud resource costs, ensuring efficient budget utilization
- Track and report on key performance indicators (KPIs), driving continuous improvement
- Ensure effective post-deployment support and operational readiness for new releases
- Be available for on-call support as needed
Requirements
- Bachelor’s degree in a related field or equivalent experience
- 5+ years of hands-on experience managing cloud operations, infrastructure, and IT teams in a high-availability SaaS environment
- Proven experience leading a team of 5+ IT professionals, including Systems Engineering, Network Engineering, Database Administration, and NOC functions
- Deep expertise in AWS (primary) and experience with Microsoft Azure
- Experience managing large-scale, high-volume data processing environments with strict SLAs
- Strong background in infrastructure automation, CI/CD pipelines, and configuration management
- Experience implementing and maintaining IT security policies and compliance frameworks (e.g., ISO 27001, SOC 2)
- Track record of optimizing cloud spend and resource allocation
- Familiarity with incident management, disaster recovery, and business continuity planning
- Experience supporting Agile/Scrum software development teams
- Strong communication and leadership skills, with the ability to interact with executive stakeholders, technical teams, and external vendors
- A recent AWS associate level certification is a must
- Must be eligible to work in the US or Canada without company sponsorship, now or in the future; F-1 OPT who will require H-1B, TNs, or current H-1B visa holders will not be considered
- Be available for on-call support as needed
- Preferred: Experience with container orchestration (e.g., Kubernetes, Docker); Knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation, Ansible); Prior experience in SRE methodologies; AWS Professional level certification