
Staff Engineer – DevOps
Weekday (YC W21)
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
Job Level
About the role
- Architect and evolve our DevOps ecosystem, champion cloud cost governance, and implement best-in-class container orchestration practices.
- Work cross-functionally with engineering, security, and finance teams to ensure operational excellence while proactively managing infrastructure spend.
- Lead end-to-end DevOps strategy, including CI/CD pipelines, automation, infrastructure-as-code, and release engineering.
- Design scalable, resilient cloud-native architectures aligned with business growth.
- Establish DevOps best practices, reliability standards, and operational governance.
- Architect and manage large-scale Kubernetes environments for production workloads.
- Optimize workloads across clusters for performance, reliability, and cost efficiency.
- Build and maintain containerized applications using Docker and Kubernetes, ensuring portability and scalability.
- Drive multi-cluster, multi-region deployments where necessary.
- Own infrastructure cost visibility and optimization initiatives.
- Implement cloud cost-saving strategies including rightsizing, reserved capacity planning, auto-scaling optimization, and workload scheduling.
- Create dashboards and reporting mechanisms to track infrastructure ROI and spend trends.
- Continuously identify inefficiencies and implement measurable cost-reduction initiatives without compromising performance.
- Design and implement comprehensive monitoring systems using Grafana and related observability tools.
- Build real-time dashboards for system health, performance metrics, and cost insights.
- Establish alerting frameworks to minimize downtime and improve incident response.
- Drive improvements in system reliability through data-driven monitoring and post-incident analysis.
- Automate provisioning, deployments, scaling, and recovery processes.
- Improve system resilience, availability, and disaster recovery strategies.
Requirements
- 9–15 years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
- Deep expertise in Kubernetes, container orchestration, and production-grade Docker and Kubernetes implementations.
- Strong hands-on experience with Grafana, monitoring systems, and observability frameworks.
- Proven track record in cost savings initiatives and infrastructure cost planning in cloud environments.
- Experience designing highly available, scalable systems in AWS, Azure, or GCP.
- Strong understanding of Infrastructure-as-Code (Terraform, CloudFormation, etc.).
- Expertise in CI/CD automation and release management.
- Solid knowledge of networking, security best practices, and cloud architecture patterns.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
DevOpsSite Reliability EngineeringCloud InfrastructureKubernetesDockerGrafanaInfrastructure-as-CodeCI/CD automationcloud architecturecost optimization
Soft Skills
cross-functional collaborationoperational excellenceleadershipdata-driven decision makingincident responseproblem-solvingcommunicationcost managementefficiency improvementstrategic planning