qode.world

Senior AWS Cloud/Infrastructure Engineer – Architect

qode.world

full-time

Posted on:

Location Type: Hybrid

Location: CaliforniaCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design and implement cloud‑native architectures on AWS using services such as VPC, EC2, EKS, S3, RDS/Aurora, IAM, CloudWatch, and KMS, following Well‑Architected and security best practices.
  • Lead the design and operation of event‑driven systems using Amazon MSK (Managed Streaming for Apache Kafka) and/or managed streaming frameworks (e.g., Kinesis/Kafka‑based MSF), including topic design, partitioning, consumer groups, schema evolution, and back‑pressure handling.
  • Architect and manage caching layers and in‑memory data stores (e.g., Amazon ElastiCache for Redis/Memcached or similar) to improve performance, reduce latency, and offload downstream databases.
  • Implement and support data lakehouse patterns using Apache Iceberg or similar table formats on object storage (e.g., S3), including table design, partitioning, schema evolution, and performance optimization for analytical and near‑real‑time workloads.
  • Design, provision, and operate Kubernetes clusters on Amazon EKS, including node groups, autoscaling, networking, ingress, service mesh (where applicable), secrets management, and multi‑environment separation.
  • Implement full‑stack observability using OpenTelemetry (traces, metrics, logs), integrating with centralized telemetry backends, defining SLOs/SLIs, and enabling deep visibility into distributed, event‑driven workloads.
  • Build and maintain Infrastructure‑as‑Code (IaC) using tools such as Terraform and/or AWS CloudFormation, enforcing reusable modules, environment parity, and Git‑based workflows.
  • Establish and enhance CI/CD pipelines for infrastructure and application deployments on AWS/EKS/MSK, including automated testing, security scans, canary/blue‑green releases, and rollback strategies.
  • Ensure platform security, compliance, and governance, including IAM roles and policies, network segmentation, encryption in transit/at rest, secrets management, and audit logging.
  • Monitor and optimize cost, performance, and resilience of AWS environments; drive capacity planning, rightsizing, and architectural improvements for high availability and disaster recovery.
  • Troubleshoot complex production incidents across EKS, MSK, event pipelines, caching tiers, and data platforms, driving root cause analysis and long‑term remediation.
  • Mentor engineers, champion engineering best practices, and collaborate with architects and product teams to align platform roadmaps with business goals.

Requirements

  • 10+ years of hands‑on experience in cloud engineering, infrastructure engineering, or platform/SRE roles, with at least 5+ years focused primarily on AWS.
  • Strong expertise with core AWS services: VPC, IAM, EC2, EKS/ECS, S3, RDS/Aurora, CloudWatch/CloudTrail, KMS, and networking (subnets, routing, security groups, NACLs, load balancers).
  • Proven production experience with Amazon MSK or equivalent Kafka‑based managed streaming platforms (MSF), including cluster operations, capacity planning, security, and observability.
  • Practical experience with event‑driven and streaming architectures (e.g., Kafka/Kinesis + consumers, stream processing, CQRS, pub/sub patterns) in mission‑critical systems.
  • Hands‑on experience with caching data stores and distributed caches (e.g., Redis, Memcached, ElastiCache), including eviction strategies, key design, and cache‑aside/write‑through patterns.
  • Experience implementing or operating data lake or lakehouse solutions on S3 or similar, using Apache Iceberg or comparable table formats (e.g., Delta Lake, Hudi), and integrating with analytics/processing engines.
  • Strong Kubernetes and EKS background, including cluster lifecycle management, Helm or similar packaging, autoscaling, network policies, and container security baselines.
  • Deep understanding of observability, distributed tracing, and telemetry; hands‑on with OpenTelemetry SDKs/collectors and integration into logging/metrics/tracing backends.
  • Proficiency with IaC tools such as Terraform and/or CloudFormation, plus strong Git and DevOps practices around code review, branching, and automated testing.
  • Solid scripting or programming skills (e.g., Python, Bash, Go, or similar) for automation, tooling, and glue code around AWS, MSK, EKS, and observability stacks.
  • Strong knowledge of security, networking, and compliance in cloud environments, including least‑privilege IAM, network isolation, certificate management, and secrets rotation.
  • Excellent communication and stakeholder management skills, with experience collaborating in cross‑functional teams and mentoring engineers at mid‑level and below.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSVPCEC2EKSS3RDSAuroraKubernetesOpenTelemetryTerraform
Soft Skills
communicationstakeholder managementmentoringcollaborationleadership