Group 1001

Platform Reliability Engineering Lead

Group 1001

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $225,000 - $245,000 per year

Job Level

Senior

Tech Stack

AWSAzureCloudGoogle Cloud PlatformTerraform

About the role

  • Group1001’s culture emphasizes the importance of collaboration, communication, core business focus, risk management, and striving for outcomes.
  • Provide Technical and Operational leadership for the design, delivery, and management of Site-Reliability services, with a focus on operational excellence and reliability.
  • Lead the development and implementation of multi-cloud infrastructure-as-code solutions, emphasizing Terraform and automation best practices.
  • Design and oversee the ongoing Definition and Management of Operational Services, with clear Metrics and SLAs to yield high availability, performance, and maintainability.
  • Participate in Programs that involve migration, integration, and operationalization of applications across AWS, GCP, and Azure, optimizing for efficiency and resilience.
  • Champion cloud security best practices and ensure compliance with relevant industry standards across all platforms.
  • Mentor and develop team members in SRE principles, cloud technologies, infrastructure-as-code, and operational best practices.
  • Stay current with emerging cloud and SRE trends, particularly within GCP and AWS, and assess their impact on strategic and operational objectives.

Requirements

  • Minimum of 10 years of experience in cloud platform engineering, architecture, and operational leadership.
  • Demonstrated SRE experience or equivalent technical and operational leadership in managing large-scale, reliable services.
  • Deep expertise in enterprise-grade AWS or GCP architecture; Azure experience is a plus.
  • Advanced proficiency with Terraform (or similar) for infrastructure automation.
  • Strong knowledge of cloud security and compliance measures.
  • Excellent leadership, communication, and organizational skills.
  • Experience managing enterprise identity and network access services. (Preferred)
  • Knowledge of DevOps practices and their integration within cloud and operational environments. (Preferred)
  • Familiarity with AI/ML implementations in cloud platforms. (Preferred)