Tech Stack
ApacheAWSCloudDistributed SystemsDNSDockerElasticSearchFirewallsGrafanaKubernetesLinuxMongoDBNGINXPrometheusPythonTCP/IP
About the role
- Job location: Gurugram (Hybrid).
- Improve current Infrastructure-as-Code, observability stack, and incident response processes.
- Work with data science, analytics, and engineering teams to build optimized CI/CD pipelines, scalable AWS infrastructure, and Kubernetes deployments.
- Work with engineering, automation, and data teams to address various infrastructure requirements.
- Design modular and efficient GitOps CI/CD pipelines, agnostic to the underlying platform.
- Manage AWS services for multiple teams.
- Manage custom data store deployments like sharded MongoDB clusters and Elasticsearch clusters.
- Deploy and manage Kubernetes resources.
- Deploy and manage custom metrics exporters, trace data, and custom application metrics; design dashboards and query metrics as an end-to-end observability solution.
- Set up incident response services and design effective processes.
- Deploy and manage critical platform services like OPA and Keycloak for IAM.
- Advocate best practices for high availability and scalability when designing AWS infrastructure, observability dashboards, implementing IAC, deploying to Kubernetes, and designing GitOps CI/CD pipelines.
Requirements
- Hands-on experience with Docker or any other container runtime environment and Linux with the ability to perform basic administrative tasks.
- Experience working with web servers (nginx, apache) and cloud providers (preferably AWS).
- Hands-on scripting and automation experience (Python, Bash), experience debugging and troubleshooting Linux environments and cloud-native deployments.
- Experience building CI/CD pipelines, with familiarity with monitoring & alerting systems (Grafana, Prometheus, and exporters).
- Knowledge of web architecture, distributed systems, and single points of failure.
- Familiarity with cloud-native deployments and concepts like high availability, scalability, and bottleneck.
- Good networking fundamentals — SSH, DNS, TCP/IP, HTTP, SSL, load balancing, reverse proxies, and firewalls.
- Experience with backend development and setting up databases and performance tuning using parameter groups.
- Working experience in Kubernetes cluster administration and Kubernetes deployments.
- Experience working alongside sec ops engineers.
- Basic knowledge of Envoy, service mesh (Istio), and SRE concepts like distributed tracing.
- Setup and usage of open telemetry, central logging, and monitoring systems.