Tech Stack
AWSCloudDockerEC2Ember.jsJavaScriptKubernetesMySQLNode.jsReact
About the role
- Be available to respond to critical service incidents outside of business hours on a rotating on-call schedule.
- Proactively monitor application health and performance across cloud infrastructure (AWS).
- Troubleshoot and prevent service interruptions in real-time, working closely with development teams to resolve incidents efficiently.
- Lead and participate in disaster recovery drills and security incident simulations.
- Implement Infrastructure as Code (IaC) and maintain scalable deployments using AWS-native tools and services.
- Collaborate with development teams to ensure smooth CI/CD workflows using Git and containerized deployments (Docker).
- Work closely with stakeholders and product teams to ensure technical reliability aligns with business needs.
- Support and improve observability tools, alerting mechanisms, and logging infrastructure to promote transparency and response agility.
- Champion best practices in security, availability, performance, and incident response.
Requirements
- 3+ years of experience in a Site Reliability, DevOps, or related engineering role.
- Proven track record managing and scaling applications in a production AWS environment.
- Strong proficiency in Amazon Web Services (AWS) with knowledge of services like EC2, ECS, RDS, CloudWatch, and IAM.
- Proficiency in Node.js and scripting for automation and tooling.
- Experience with Docker for container-based deployment pipelines.
- Familiarity with React and Ember.js to understand performance implications at the frontend level.
- Understanding of NestJS and scalable Node-based services.
- Proficient in MySQL and performance monitoring of relational databases.
- Proficiency with Git for collaborative code management and DevOps workflow integration.
- Experience with container orchestration (e.g., ECS or Kubernetes is a plus).
- Be available to respond to critical service incidents outside of business hours on a rotating on-call schedule.
- Commitment to uptime, performance, and security in fast-moving SaaS environments.
- Fostering a great workplace culture for our team.
- Remote (United States)
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSNode.jsDockerMySQLGitInfrastructure as Code (IaC)EC2ECSRDSCloudWatch
Soft skills
collaborationincident responseproactive monitoringtroubleshootingleadershipcommunicationcommitment to uptimeperformancesecuritystakeholder engagement