Senior Data Engineer – AWS, RAG Pipelines

Jalasoft

Senior Data Engineer designing and operating cloud data infrastructures for AI initiatives. Building data lakes on AWS and real-time pipelines for RAG systems.

Posted 6/12/2026full-timeRemote • 🇨🇴 ColombiaSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

Data EngineeringDistributed SystemsData ArchitectureAWS Data Lake ArchitectureReal-Time ObservabilityLog AnalyticsElasticsearchOpenSearchC#Python

Tools & Technologies

AWSAmazon BedrockDebeziumAWS DMSAmazon AthenaRedshift SpectrumDatadogS3ETLELT

Industry Keywords

AI initiativesRAG systemsembeddingsvector searchproduction-scale data lakesevent streaminghigh-dimensional vector searchmulti-tenant access controlCDCmetadata extraction

Tech Stack

Tools & technologies

Amazon RedshiftAWSCloudDistributed SystemsElasticSearchETLJavaJavaScript.NETNode.jsPostgresPython

About the role

Key responsibilities & impact

Design and operate the cloud data infrastructure powering AI initiatives.
Architect production-scale data lakes on AWS.
Build real-time ingestion and observability pipelines.
Own the vector search and embedding layers that feed RAG systems and autonomous agents.

Requirements

What you’ll need

Overall Experience: 7+ years in Data Engineering, Distributed Systems, or Data Architecture
AWS & Infrastructure: 4+ years architecting production-scale data lakes, storage tiers, and event streaming
AI/LLM Pipelines: 2+ years building RAG systems, managing embeddings, and orchestrating foundational models
Proficiency in AWS Data Lake Architecture & Storage
Proficiency in Real-Time Observability & Log Analytics
Proficiency in Elasticsearch & OpenSearch Optimization, Vectorization, Embeddings
Proficiency in Amazon Bedrock & Generative AI Pipelines
Proficiency in Software Engineering & API Ingestion
Production-level proficiency in one or more of: C# (.NET Core), Java, Python, or Node.js
AWS S3 partitioning strategies, lifecycle policies, and columnar formats (Parquet, Iceberg)
AWS Glue Data Catalog and Lake Formation for multi-tenant, fine-grained access control
Query optimization over petabyte-scale datasets using Amazon Athena and Redshift Spectrum
Distributed oTel collector configuration for log, trace, and metrics capture and routing into S3
High-volume streaming of system logs, Datadog captures, and raw server events into S3
Real-time CDC from PostgreSQL using Debezium or AWS DMS
Amazon OpenSearch clusters with simultaneous lexical and high-dimensional vector search
OpenSearch index lifecycle management, sharding strategies, and dynamic mappings at scale
Amazon Bedrock foundational model APIs (Claude, Titan) for data enrichment, classification, and semantic parsing
Knowledge Bases for Amazon Bedrock for automatic chunking, metadata extraction, and vector index syncs from S3
ETL/ELT pipelines ingesting unstructured event data from SaaS APIs (e.g., Pendo, Hotjar, Google Analytics)
MCP server development to expose data lake context and utilities to AI agents

Benefits

Comp & perks

Remote work.
13 floating holiday.
15 vacation days per year completed.
Good working environment.