
Senior Infrastructure Engineer
Replit
full-time
Posted on:
Location Type: Hybrid
Location: Foster City • California • United States
Visit company websiteExplore more
Salary
💰 $190,000 - $240,000 per year
Job Level
About the role
- Drive Automation and Infrastructure as Code: Build and improve automation to eliminate toil and operational work. Maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.
- Optimize Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimize our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks and implement capacity planning strategies.
- Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.
- Drive Cross-Team Improvements: Partner with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.
- Build Shared Tooling: Create and maintain centralized tooling and automation that improves the engineering lifecycle, from local development to production monitoring.
- Debug and Harden Systems: Dive deep into debugging difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.
- Collaborate on Design Reviews: Participate in feature and system design reviews, contributing expertise on security, scale, and operational considerations.
- Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.
Requirements
- 4+ years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering).
- Strong programming skills in languages like Python or Go.
- You write high-quality, well-tested code.
- Solid understanding of distributed systems. You've built, scaled, and maintained production services and understand service-oriented architecture.
- Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.
- Experience implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.
- Strong incident management skills with experience participating in incident response and demonstrated critical thinking under pressure.
- Experience with infrastructure as code (e.g., Terraform) and configuration management tools.
- Excellent written and verbal communication skills, with an ability to explain technical concepts clearly.
- A willingness to dive into understanding, debugging, and improving any layer of the stack.
- You're passionate about making software creation accessible and empowering the next generation of builders.
Benefits
- Competitive Salary & Equity
- 401(k) Program
- Health, Dental, Vision and Life Insurance
- Short Term and Long Term Disability
- Paid Parental, Medical, Caregiver Leave
- Commuter Benefits
- Monthly Wellness Stipend
- Autonoumous Work Environement
- In Office Set-Up Reimbursement
- Flexible Time Off (FTO) + Holidays
- Quarterly Team Gatherings
- In Office Amenities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGoCI/CDTerraformPulumiKubernetesDockerGCPmonitoring solutionsperformance tuning
Soft Skills
incident managementcritical thinkingcommunicationcollaborationdebuggingproblem-solvingdesign review participationcustomer focusadaptabilityteamwork