
Staff Site Reliability Engineer, Platform Engineering
Paxos
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $210,000 - $240,825 per year
Job Level
About the role
- Architect, build, and operate resilient, scalable, and self-healing cloud infrastructure on AWS.
- Lead the evolution of Kubernetes and platform services to enable secure, automated, and multi-region operations.
- Define and enforce Infrastructure as Code (IaC) standards using Terraform, AWS CDK, and Crossplane to ensure consistency, security, and auditability.
- Drive automation across provisioning, configuration, and monitoring pipelines to reduce manual effort and operational risk.
- Establish and champion reliability, observability, and performance standards across Tier-1 services, ensuring alignment with regulatory and partner requirements.
- Partner with product engineering to enhance CI/CD velocity, service resilience, and visibility through shared tooling, SLOs, and platform patterns.
- Lead incident reviews, root-cause analyses, and systemic reliability improvements, embedding learnings into runbooks and design practices.
- Optimize cloud infrastructure for cost, performance, and fault tolerance, driving data-driven operational excellence.
- Mentor and upskill engineers, shaping architectural direction and influencing design decisions across multiple teams.
- Contribute to the technical strategy and roadmap for Paxos’ infrastructure platform, aligning platform scalability with business growth and compliance objectives.
Requirements
- Bachelor’s degree in Computer Science, Information Technology, or a related field — or equivalent practical experience.
- 8+ years of experience in Site Reliability Engineering, DevOps, or related infrastructure roles.
- Deep expertise in public cloud platforms, especially AWS, with hands-on experience in services like EC2, S3, Lambda, CloudWatch, and IAM.
- Strong proficiency with Kubernetes and container orchestration — you’ve run production workloads and understand cluster management, scaling, and troubleshooting.
- Extensive experience with Infrastructure as Code (IaC) using tools such as Terraform, Pulumi, or Crossplane.
- Solid scripting or programming skills in languages like Python, Bash, or Go, with a strong focus on automation.
- Working knowledge of managed database services like Amazon RDS, Aurora, or PostgreSQL is a plus — but infrastructure is your main game.
- Excellent problem-solving and debugging skills, with a systems-thinking mindset.
- Strong communicator who thrives in collaborative, remote-first teams.
Benefits
- Offers Equity
- Offers Bonus
- 15% Annual Salary
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSKubernetesInfrastructure as CodeTerraformAWS CDKCrossplanePythonBashGoCI/CD
Soft Skills
problem-solvingdebuggingcommunicationcollaborationmentoringleadershipsystems-thinkingoperational excellencereliabilityobservability