COGNNA

Senior Data Engineer

COGNNA

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • As a Senior Data Engineer, you will be the architect of our security data ecosystem.
  • Your primary mission is to design and build high-performance data lake architectures and real-time streaming pipelines that serve as the foundation for COGNNA's Agentic AI initiatives.
  • You will ensure that our AI models have access to fresh, high-quality security telemetry through sophisticated ingestion patterns.
  • **Key Responsibilities**
  • 1. Data Lake & Storage Architecture
  • - **Architectural Design:** Design and implement multi-tier Data Lakehouse architectures to support both structured security logs and unstructured AI training data.
  • - **Storage Optimization:** Define lifecycle management, partitioning, and clustering strategies to ensure high-performance querying while optimizing for cloud storage costs.
  • - **Schema Evolution:** Manage complex schema evolution for security telemetry, ensuring compatibility with downstream AI/ML feature engineering.
  • 2. Real-Time & Streaming Processing
  • - **Streaming Ingestion:** Build and manage low-latency, high-throughput ingestion pipelines capable of processing millions of security events per second in real-time.
  • - **Unified Processing:** Design unified batch and stream processing architectures to ensure consistency across historical analysis and real-time threat detection.
  • - **Event-Driven Workflows:** Implement event-driven patterns to trigger AI agent reasoning based on incoming live data streams.
  • 3. AI/ML Enablement & Feature Engineering
  • - **Vector Data Foundations:** Architect the data infrastructure required to support semantic search applications and variants of RAG architectures for our generative AI models.
  • - **Feature Management:** Design and maintain a centralized repository for ML features, ensuring consistent data is used for both model training and real-time inference.
  • - **AI Pipeline Orchestration:** Build automated workflows to handle data preparation, model evaluation, and deployment within our cloud AI ecosystem.
  • 4. DataOps & Systems Design
  • - **Infrastructure as Code:** Utilize declarative tools (e.g., Terraform) to manage the entire lifecycle of our cloud data resources and AI endpoints.
  • - **Quality & Observability:** Implement automated data quality frameworks and real-time monitoring to detect "data drift" or pipeline failures before they impact AI model performance.

Requirements

  • **Experience & Education:** 5+ years in Data Engineering or Backend Engineering, focused on large-scale distributed systems. B.S. or M.S. in Computer Science or a related technical field.
  • **Cloud Architecture:** Deep architectural mastery of the Google Cloud Platform ecosystem, specifically regarding managed analytical warehouses, serverless compute, and identity/access management. Proven track record of deploying enterprise-scale Data Lakehouses from scratch.
  • **Real-Time Mastery:** Expertise in building production-grade distributed messaging and stream processing engines (e.g., managed Apache Beam/Flink environments) capable of handling high-velocity telemetry.
  • **AI Enablement:** Strong understanding of how data architecture impacts AI performance. Experience building embedding pipelines, feature stores, and automated workflows for model training and evaluation.
  • **Software Fundamentals:** Expert-level Python and advanced SQL. Proficiency in high-performance languages like Go or Scala is highly desirable.
  • **Operational Excellence:** Advanced knowledge of CI/CD, containerization on Kubernetes, and managing cloud infrastructure through code to ensure reproducible environments.
  • Preferred Qualifications
  • Experience with dbt for modern analytics engineering.
  • Understanding of cybersecurity data standards (OCSF/ECS).
  • Previous experience in an AI-first startup or a high-growth security tech company.
Benefits
  • 💰 **Competitive Package** – Salary + equity options + performance incentives
  • 🧘 **Flexible & Remote** – Work from anywhere with an outcomes-first culture
  • 🤝 **Team of Experts** – Work with designers, engineers, and security pros solving real-world problems
  • 🚀 **Growth-Focused** – Your ideas ship, your voice counts, your growth matters
  • 🌍 **Global Impact** – Build products that protect critical systems and data

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Data EngineeringBackend EngineeringData Lakehouse ArchitectureReal-Time Streaming ProcessingPythonSQLGoScalaCI/CDContainerization
Soft skills
Architectural DesignOperational ExcellenceQuality AssuranceProblem SolvingCollaboration