Salary
💰 $130,000 - $225,000 per year
Tech Stack
AirflowAWSCloudDynamoDBIoTPostgresPythonRedisTerraform
About the role
- Cloud Infrastructure Setup and Maintenance: Design, provision, and maintain AWS infrastructure using IaC tools such as AWS CDK or Terraform.\n
- Build CI/CD and testing for apps, infra, and ML pipelines using GitHub Actions, CodeBuild, and CodePipeline.\n
- Operate secure networking with VPCs, PrivateLink, and VPC endpoints. Manage IAM, KMS, Secrets Manager, and audit logging.\n
- LLM Platform and Runtime: Stand up and operate model endpoints using AWS Bedrock and/or SageMaker; evaluate when to use ECS/EKS, Lambda, or Batch for inference jobs.\n
- Build and maintain application services that call LLMs through clean APIs, with streaming, batching, and backoff strategies.\n
- Implement prompt and tool execution flows with LangChain or similar, including agent tools and function calling.\n
- RAG Data Systems and Vector Search: Design chunking and embedding pipelines for documents, time series, and multimedia. Orchestrate with Step Functions or Airflow.\n
- Operate vector search using OpenSearch Serverless, Aurora PostgreSQL with pgvector, or Pinecone. Tune recall, latency, and cost.\n
- Build and maintain knowledge bases and data syncs from S3, Aurora, DynamoDB, and external sources.\n
- Evaluation, Observability, and Cost Governance: Create offline and online eval harnesses for prompts, retrievers, and chains. Track quality, latency, and regression risk.\n
- Instrument model and app telemetry with CloudWatch and OpenTelemetry. Build token usage and cost dashboards with budgets and alerts.\n
- Add guardrails, rate limits, fallbacks, and provider routing for resilience.\n
- Safety, Privacy, and Compliance: Implement PII detection and redaction, access controls, content filters, and human-in-the-loop review where needed.\n
- Use Bedrock Guardrails or policy services to enforce safety standards. Maintain audit trails for regulated environments.\n
- Data Pipeline Construction: Build ingestion and processing pipelines for structured, unstructured, and multimedia data. Ensure integrity, lineage, and cataloging with Glue and Lake Formation.\n
- Optimize bulk data movement and storage in S3, Glacier, and tiered storage. Use Athena for ad-hoc analysis.\n
- IoT Deployment Management: Manage infrastructure that deploys to and communicates with edge devices. Support secure messaging, identity, and over-the-air updates.\n
- Analytics and Application Support: Partner with product and application teams to integrate retrieval services, embeddings, and LLM chains into user-facing features.\n
- Provide expert troubleshooting for cloud and ML services with an emphasis on uptime and performance.\n
- Performance Optimization: Tune retrieval quality, context window use, and caching with Redis or Bedrock Knowledge Bases.\n
- Optimize inference with model selection, quantization where applicable, GPU/CPU instance choices, and autoscaling strategies.
Requirements
- Do you have at least 3 years of professional work experience, or a masters related to this role?\n
- Do you have experience designing, setting up and maintaining cloud infrastructure using Infrastructure as Code (IaC) tools like AWS CDK, ensuring robust CI/CD and testing components?\n
- Do you have experience building and managing data pipelines?\n
- For this role, we would like the person to be in the San Francisco Bay Area. Are you living there now or willing to relocate?\n
- Are you currently authorized to work lawfully in the United States?*\n
- Will you now or in the future require sponsorship for employment visa status (e.g. H-1B visa)?\n
- Are you presently a US Person? A “U.S. Person” is defined as a: Lawful Permanent Resident: U.S. Citizen OR Legal Immigrant with a “Green Card” * Protected Individual granted asylum or refugee status