Tech Stack
AWSAzureCloudDockerElasticSearchGoogle Cloud PlatformKafkaKubernetesLinuxPostgresPrometheusRabbitMQRedisTerraform
About the role
- Our Clients Cloud Operations team is a group of talented engineers passionate about building highly reliable, scalable and secure solutions in public/private cloud environments. We are looking to hire a highly motivated Cloud Operations engineer with strong working experience in production operation and deployment automation. You will work with the team to design, develop and implement deployment automation solutions end-to-end. You will also be expected to participate in continuous cloud service operation, troubleshoot and resolve complex issues in production. We will work together to design, develop and implement the best public / private / local cloud solutions for our customers.
- Responsibilities:
- 1. Manage and maintain our clients service infrastructure in AWS, GCP & Azure.
- 2. Participate in continuous cloud service operations with remote cloud operations teams.
- 3. Troubleshoot and follow up on production infrastructure / application related issues.
- 4. Driving root cause analysis and resolution.
- 5. Communicate with Dev/QA as well as external carriers to resolve and prevent issues.
- 6. Participate in release deployment, system maintenance and cloud expansion.
- 7. Improve service availability and scalability through tuning, automation, tools, and process.
- 8. Analyze service performance, identify bottleneck and provide actionable improvement plans.
- 9. Improve service monitoring coverage, accuracy and efficiency.
- 10. Participate in cloud security and compliance implementation.
Requirements
- 1. BS level technical degree required; Computer Science or Engineering background preferred.
- 2. 5+ years of experience in a CloudOps / DevOps role.
- 3. Hands on experience with AWS or any public cloud (Azure, GCP etc).
- 4. Knowledge of Linux, security and networking fundamentals.
- 5. Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
- 6. Working knowledge of deployment automation development (Argo Workflows, Terraform, Helm).
- 7. Experience in diagnosing and resolving complex application problems.
- 8. Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
- 9. Experience with monitoring tools (Nagios, Kibana, Prometheus)
- 10. Experience with cloud security and compliance implementation is a plus.
- 11. Strong follow-through and initiative to stay with issues until they are resolved.
- 12. Comfortable working within a distributed team located in multiple time zones.