Software Development Engineer III – Infrastructure

HighLevel

full-time

Posted on: 1/13/2026

Location Type: Remote

Location: India

Visit company website

Explore more

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Cloud Go Google Cloud Platform Kubernetes Python

About the role

Participate in 24/7 on-call rotations for core infrastructure systems
Execute incident response during production events, including triage, mitigation, and recovery
Maintain and improve runbooks, operational procedures, and escalation paths
Help reduce MTTR and prevent repeat incidents through engineering solutions
Improve reliability of core infrastructure components including: Kubernetes (GKE) clusters, Cloud networking and load balancing & Edge services (Cloudflare)
Identify systemic reliability issues and drive corrective actions
Support capacity planning, scaling, and resilience testing
Execute security remediations across cloud and Kubernetes environments
Support enforcement of: IAM least-privilege access, Network security controls & Runtime security policies
Partner with Platform Security on vulnerability management and remediation
Support security incident response and post-incident reviews
Automate repetitive operational and security tasks
Build tooling to improve: Incident response speed, Operational visibility & Security posture enforcement
Reduce manual toil through scripts, tooling, and process improvements
Support safe execution of infrastructure and configuration changes
Ensure changes follow defined change management and audit requirements
Contribute to incident reviews, postmortems, and continuous improvement initiatives
Work closely with Cloud Infrastructure, SRE, Platform, Data, and Security teams
Contribute to shared documentation and operational standards
Mentor junior engineers and lead small reliability or security initiatives

Requirements

4+ years of experience operating large-scale systems
Experience leading incident response or reliability initiatives
Ability to identify systemic issues and propose long-term fixes
Comfortable mentoring junior engineers and influencing peers
Experience supporting or operating production systems
Strong understanding of reliability, security, and operational best practices
Comfortable working in on-call and incident response environments
Strong troubleshooting and communication skills
Experience with GCP or other public cloud platforms (Nice to have)
Experience with Kubernetes (GKE) in production (Nice to have)
Familiarity with Cloudflare, networking, or edge security (Nice to have)
Exposure to security tooling or vulnerability management (Nice to have)
Scripting or automation experience (Python, Go, Bash, etc.) (Nice to have)
Experience in compliance- or audit-driven environments (SOC2, ISO) (Nice to have)

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

KubernetesGKECloud networkingLoad balancingCloudflareIncident responseScriptingAutomationSecurity remediationsCapacity planning

Soft skills

MentoringCommunicationTroubleshootingLeadershipCollaborationProblem-solvingInfluencingContinuous improvementOperational best practicesIncident management