FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer – SRE
Air AppsSite Reliability Engineer at Air Apps focusing on reliable and scalable cloud systems. Collaborate with cross-functional teams and optimize system performance through automation.
Tech Stack
Tools & technologiesAWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPythonTerraform
About the role
Key responsibilities & impact- Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
- Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK).
- Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Optimize system performance, scalability, and incident response workflows to improve uptime.
- Work closely with development and DevOps teams to improve system design for reliability.
- Conduct root cause analysis (RCA) and implement preventative measures to minimize failures.
- Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies.
- Improve CI/CD pipelines to enhance deployment speed while maintaining stability.
- Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP).
- Participate in on-call rotations to quickly address system failures and minimize downtime.
Requirements
What you’ll need- Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.
- Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
- Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).
- Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.
- Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm).
- Strong Linux system administration and networking fundamentals.
- Experience with incident management, debugging, and root cause analysis.
- Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.
- Knowledge of load balancing, failover strategies, and distributed systems.
- Understanding of security best practices, access control, and compliance requirements.
- Strong communication skills and the ability to collaborate with cross-functional teams.
Benefits
Comp & perks- Apple hardware ecosystem for work.
- Annual Bonus
- Top-tier Health and Life Insurance for peace of mind.
- Transportation Budget to support your commute needs.
- Coverflex benefits package for meal allowances, well-being, and more.
- Childcare support.
- Air Conference - an opportunity to meet the team, collaborate, and grow together.
- Pension Fund to support your long-term financial planning.
- Urban Sports Club membership to keep you active.
- Meals 100% free at the hub.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsSystem Engineeringcloud platformsInfrastructure as Codecontainerizationorchestrationscriptingload balancingdistributed systems
Soft Skills
communicationcollaborationincident managementdebuggingroot cause analysis