HighLevel

Software Development Engineer III – Infrastructure

HighLevel

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

About the role

  • Participate in 24/7 on-call rotations for core infrastructure systems
  • Execute incident response during production events, including triage, mitigation, and recovery
  • Maintain and improve runbooks, operational procedures, and escalation paths
  • Help reduce MTTR and prevent repeat incidents through engineering solutions
  • Improve reliability of core infrastructure components including: Kubernetes (GKE) clusters, Cloud networking and load balancing & Edge services (Cloudflare)
  • Identify systemic reliability issues and drive corrective actions
  • Support capacity planning, scaling, and resilience testing
  • Execute security remediations across cloud and Kubernetes environments
  • Support enforcement of: IAM least-privilege access, Network security controls & Runtime security policies
  • Partner with Platform Security on vulnerability management and remediation
  • Support security incident response and post-incident reviews
  • Automate repetitive operational and security tasks
  • Build tooling to improve: Incident response speed, Operational visibility & Security posture enforcement
  • Reduce manual toil through scripts, tooling, and process improvements
  • Support safe execution of infrastructure and configuration changes
  • Ensure changes follow defined change management and audit requirements
  • Contribute to incident reviews, postmortems, and continuous improvement initiatives
  • Work closely with Cloud Infrastructure, SRE, Platform, Data, and Security teams
  • Contribute to shared documentation and operational standards
  • Mentor junior engineers and lead small reliability or security initiatives

Requirements

  • 4+ years of experience operating large-scale systems
  • Experience leading incident response or reliability initiatives
  • Ability to identify systemic issues and propose long-term fixes
  • Comfortable mentoring junior engineers and influencing peers
  • Experience supporting or operating production systems
  • Strong understanding of reliability, security, and operational best practices
  • Comfortable working in on-call and incident response environments
  • Strong troubleshooting and communication skills
  • Experience with GCP or other public cloud platforms (Nice to have)
  • Experience with Kubernetes (GKE) in production (Nice to have)
  • Familiarity with Cloudflare, networking, or edge security (Nice to have)
  • Exposure to security tooling or vulnerability management (Nice to have)
  • Scripting or automation experience (Python, Go, Bash, etc.) (Nice to have)
  • Experience in compliance- or audit-driven environments (SOC2, ISO) (Nice to have)

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
KubernetesGKECloud networkingLoad balancingCloudflareIncident responseScriptingAutomationSecurity remediationsCapacity planning
Soft skills
MentoringCommunicationTroubleshootingLeadershipCollaborationProblem-solvingInfluencingContinuous improvementOperational best practicesIncident management