Design, build, and evolve production cloud systems infrastructure, strategically employing automation and infrastructure-as-code (IaC)
Design and build systems, tools, and services for LendingClub’s next generation cloud platform, including infrastructure tooling, automation, build and deployment pipelines, monitoring and logging architecture, and containerization
Partner cross-functionally with Infrastructure, Engineering, and Security teams to design, deploy, and maintain cloud solutions that meet internal standards and industry best practices
Operate and scale large-scale web operations and systems administration to ensure platform reliability, performance, and security 24x7
Lead incident and problem management including identification, resolution, and root-cause analysis
Continuously improve deployment, configuration, and monitoring capabilities by anticipating growth and industry trends
Tackle complex challenges to improve platform reliability, resiliency, and scalability in a regulated, high-traffic fintech environment
Foster a culture of collaboration through documentation, knowledge sharing, and mentoring team members
Ensure service level objectives through proactive monitoring, custom tooling, and on-call support/troubleshooting
Requirements
6+ years of expert Linux and Windows System Administration experience (CentOS, Amazon Linux, Windows Server 2019+)
Bachelor's degree or higher in Computer Science, or related field; or related work experience
4+ years of experience working directly with AWS Infrastructure services; AWS Certification preferred; strong knowledge of AWS services and security required
Experience with automation frameworks/tools like Git, Jenkins, Terraform, Packer is a MUST
Experience with application delivery platforms for load balancing, caching, compression, application firewalling, reverse proxy, SSL termination, etc.
Hands-on experience with Kubernetes (design, deployment, and operations) in production environments
Fluency with one or more current generation scripting language used by DevOps professionals (Python, Bash / Shell, Ruby)
Monitoring and logging experience in any of the following: Prometheus, Splunk, Syslog, OSSEC/Wazzuh, CloudWatch
Experience supporting high traffic and public facing websites, applications, and services that span multiple data-centers and cloud environments
Expertise in incident and problem management including timely problem identification, successful resolution, and root-cause analysis
Exceptional communication skills - written and verbal
Ability to be responsive, flexible, and succeed in a collaborative peer environment
Local hours (PT, MT) and flexibility to work across time zones when necessary
Benefits
medical, dental and vision plans for employees and their families
401(k) match
health and wellness programs
flexible time off policies for salaried employees
up to 16 weeks paid parental leave
relocation, based on actual job level
long-term awards (equity) and an annual bonus (which is based on company performance, employee performance and eligible earnings)
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Linux System AdministrationWindows System AdministrationAWS Infrastructure servicesAutomation frameworksKubernetesScripting languagesMonitoring and loggingIncident managementProblem managementCloud solutions
Soft skills
Communication skillsCollaborationFlexibilityResponsivenessMentoringKnowledge sharingProblem identificationResolution skillsRoot-cause analysisAnticipation of growth