OneStream Software

Director, Cloud Engineering

OneStream Software

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $158,500 - $198,250 per year

Job Level

Lead

Tech Stack

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKubernetesLinux.NETOpenShiftPythonSDLCSQLTerraform

About the role

  • Build, lead, and mentor multiple DevOps and Site Reliability Engineering team members, including management, focused on operational excellence, reliability, and automation
  • Define and monitor KPIs and SLIs/SLOs to measure reliability, system performance, and operational maturity
  • Implement robust observability frameworks encompassing logging, metrics, and tracing to ensure proactive monitoring and incident response
  • Foster a culture of continuous improvement, driving efficiency, reducing technical debt, and simplifying complex systems, engaging in the day-to-day team efforts and escalations
  • Architect, implement, and optimize scalable and secure cloud environments (AWS, GCP, Azure) to support enterprisegrade applications and platforms
  • Define and execute the enterprise DevOps strategy to align with broader engineering and business objectives
  • Drive the design and deployment of CI/CD pipelines and automated release processes to enhance development velocity and reliability
  • Lead large-scale modernization initiatives, including cloud migrations, containerization (Kubernetes), and platform consolidation
  • Leverage data and analytics to measure and improve system health, deployment efficiency, and incident response performance
  • Integrate security best practices throughout the SDLC, aligning with standards such as SOC2, FedRAMP, and GDPR
  • Ensure the creation and maintenance of technical documentation, workflows, and knowledge-based articles
  • Instill quality-of-work standards and clear expectations for projects and tasks
  • Proactively identify issues and enact solutions which have a significant and quantifiable impact on OKR’s and business objectives

Requirements

  • BS/BA in Computer Science, Engineering, related field, or equivalent work experience
  • 10+ years of software related experience required (Site Reliability, DevOps, Release Engineering)
  • 4+ years of building and managing high-performing engineering teams or similar roles across development or operational teams
  • Experience working for a cloud service provider (CSP), managed service provider (MSP), or enterprise SaaS company
  • Advanced knowledge of SaaS application architecture and design
  • Experience running and monitoring large scale distributed systems
  • Deep understanding of cloud native concepts including elasticity, interconnectivity, security, and identity management
  • Experience leading teams with the following technologies, tools, and concepts: Deploying and managing solutions hosted on major public cloud providers (Azure, AWS, GCP)
  • Hybrid Microsoft and Linux-based technology stack, including Windows Server, .NET/C#, IIS, SQL Server, alongside Linux (Alpine, Ubuntu), and container orchestration via Kubernetes
  • Automating processes using PowerShell, CLI, Bash, Python, or other scripting languages
  • Strong understanding of Azure Kubernetes Services (AKS) with container-based deployment skills or other platforms such as OpenShift, GKS, EKS
  • Identity management using Azure AD, Okta, OpenID Connect (OIDC), SAML
  • Working knowledge of various cryptographic algorithms and protocols (TLS, mTLS, SSH, AES)
  • Hand-on experience with orchestration, configuration management, and CI/CD tools (e.g., Terraform, ArgoCD, Azure DevOps Pipelines, git etc.)
Benefits
  • Vision
  • Medical
  • Life
  • Dental
  • 401K

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
DevOpsSite Reliability EngineeringCI/CDKubernetesAWSGCPAzurePythonPowerShellTerraform
Soft skills
leadershipmentoringcommunicationproblem-solvingcontinuous improvementteam collaborationoperational excellenceefficiencytechnical debt reductionincident response