
Lead AI Engineer
TIAA
full-time
Posted on:
Location Type: Hybrid
Location: Iselin • New Jersey • North Carolina • United States
Visit company websiteExplore more
Salary
💰 $123,000 - $168,000 per year
Job Level
Tech Stack
About the role
- Design and implement Generative AI solutions using RAG (Retrieval-Augmented Generation) pipelines.
- Build end-to-end systems integrating vector databases, embedding models, and LLMs to enable context-aware, knowledge-grounded responses.
- Develop robust prompting strategies, templates, and workflows that maximize LLM performance, accuracy, and consistency
- Establish rigorous evaluation frameworks to measure model accuracy, latency, cost, hallucination rates, and task-specific performance metrics; conduct A/B testing and comparative analysis across models and configurations
- Implement comprehensive logging, tracing, and alerting systems to track model behavior, prompt-response patterns, token usage, errors, and drift in production environments
- Build production-grade AI agents using both low-code platforms and high-code custom implementations, using Langchain, Langgraph, and optimize for performance, and maintainability.
- Architect and develop large-scale, cloud-native Python applications using modern frameworks such as FastAPI, Flask, optimized for high performance, low latency, and horizontal scalability
- Design distributed system architectures that leverage AWS services including Lambda , ECS/EKS , EC2 , S3 , DynamoDB , RDS/Aurora ), ElastiCache , OpenSearch , SQS , SNS , EventBridge , Step Functions , Bedrock , Textract and Domino/SageMaker platforms.
- Build responsive, intuitive user interfaces using React, TypeScript/JavaScript, and modern frontend frameworks to deliver seamless user experiences for AI-powered applications
- Implement API design best practices including RESTful principles, Open API/Swagger documentation, versioning strategies, rate limiting, authentication/authorization, and error handling
- Optimize application performance through caching strategies, asynchronous processing, connection pooling, efficient data serialization, and proactive bottleneck identification
- Design for reliability and resilience by implementing retry logic, circuit breakers, graceful degradation, health checks, and disaster recovery mechanisms
- Establish and enforce CI/CD best practices using GitHub Actions, Jenkins, GitLab CI, or AWS Code Pipeline to automate build, test, and deployment processes
- Implement Infrastructure as Code (IaC) using Terraform, AWS CloudFormation, or CDK to enable consistent, version-controlled, and reproducible infrastructure provisioning
- Design and manage containerized applications using Docker for packaging and Kubernetes (EKS) or ECS for orchestration, ensuring efficient resource utilization and auto-scaling
- Implement robust testing strategies including unit tests, integration tests, end-to-end tests, performance tests, and AI-specific testing (prompt regression tests, model output validation)
- Establish observability and monitoring frameworks using CloudWatch, Prometheus, or Langfuse, LangSmith to track system health, application performance, model behavior, and business metrics
- Apply security best practices including IAM, least-privilege access, role-based access control (RBAC), multi-factor authentication, enforce encryption at rest and in transit, secure key management, and data masking/tokenization for sensitive information.
- Configure VPCs, security groups, network ACLs, and private endpoints to minimize attack surface. Implement input validation, output encoding, SQL injection prevention, and secure API authentication (OAuth 2.0, JWT).
- Maintain comprehensive documentation of system architectures, data flows, security controls, and operational procedures to support compliance audits and knowledge transfer.
Requirements
- Bachelor's Degree Required
- 5+ years of software engineering experience with demonstrated progression in technical leadership and system design
- 3+ years of hands-on experience with AI/ML, with at least 1+ year focused on Generative AI, LLMs, and production deployment
- Expert-level Python programming with deep knowledge of advanced language features, design patterns, performance optimization, and popular frameworks (FastAPI, Flask, Pandas, NumPy).
- Full-stack development skills including backend API development with RESTful design principles, frontend development using React JS, database design and optimization (SQL and NoSQL)
- Extensive AWS experience with hands-on implementation of compute, storage, networking, security, and AI/ML services.
- Production experience with Generative AI technologies: LLM APIs (Open AI or Anthropic Claude), RAG frameworks and vector databases, Prompt engineering and optimization techniques, AI agent frameworks (Lang Chain and Lang Graph), Model fine-tuning and evaluation
- Experience in building CI/CD pipelines using Infrastructure as Code (Terraform, CloudFormation), Container orchestration (Docker, Kubernetes/EKS), Monitoring and observability tools
Benefits
- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGenerative AIRAGLLMsFastAPIFlaskReactRESTful APITerraformDocker
Soft Skills
technical leadershipsystem designcollaborationproblem-solvingcommunicationevaluation frameworksperformance optimizationdocumentationtesting strategiesobservability
Certifications
Bachelor's Degree