Tech Stack
AirflowAmazon RedshiftApacheAWSCloudDynamoDBHadoopKafkaPythonSparkTerraform
About the role
- Help customers shape their journey to adopting the cloud and provide our customers with technical and strategic guidance on their “cloud journey”.
- Consult, plan, design, and implement Data solutions on the cloud customers
- Design AWS Data Lake implementations
- Design and Develop AWS Data Quality and Governance solutions
- Become a deep technical resource that earns our customer's trust
- Develop high-quality technical content such as automation tools, reference architectures, and white papers to help our customers build on the cloud
- Innovate on behalf of customers and translate your thoughts into action yielding measurable results
- Support solution development by conveying customer needs and feedback as input to technology roadmaps
- Assist with technical briefs that document solutions and reference architecture implementations
- Support internal and external brand development through thought leadership (blog posts, internal case studies)
- AWS Data Architecture and Data Lake Implementation: assist with data collection, ingestion, catalog, storage and serving, VPC and S3 strategy, lambda architecture design, data catalog and serving layer guidance
- Build Ingestion and Orchestration: design reusable ingestion frameworks, server-less approaches, monitoring, logging, alerting, retry capabilities, orchestration using AWS Glue, Step Functions, EMR, Batch, Lambda
- Storage and Catalog: build Amazon S3 data lake storage approaches, buckets, prefixes, encryption, file types, partitioning, lifecycle management, cross-region replication if required
Requirements
- Professional experience architecting/operating Data / DevOps solutions built on AWS
- Experience in software/technology customer facing experience
- You must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future.
- Must Have: Apache Iceberg Sagemaker, SageMaker Lakehouse, SageMaker Catalog, Terraform
- Should Have: Athena, Redshift, EMR, Glue, DataBricks and/or Snowflake
- Primary Languages - Python
- Tooling, Services & Libraries – Airflow, Kafka, Parquet, Spark, Metaflow, Git, Hadoop
- AWS Infrastructure Scripting – CloudFormation, AWS CLI, AWS CDK
- Relevant AWS Services – S3, Lambda, Batch, RDS, DynamoDB, Redshift, Aurora, Neptune, VPC, CloudTrail, Service Catalog, Athena, EMR, Kinesis, Glue, Lake Formation, Data Pipeline, IAM, Step Functions