C the Signs

AI Data Engineer

C the Signs

full-time

Posted on:

Origin:  • 🇺🇸 United States • Massachusetts

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformJavaPythonScalaSpark

About the role

  • Collaborate with data scientists and machine learning engineers to understand data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store massive and diverse healthcare datasets.
  • Implement robust data validation and monitoring to ensure the integrity, accuracy, and consistency of all training datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure data quality and integrity.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Work with the team to identify and acquire new data sources, ensuring compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot issues, and implement optimizations to improve efficiency and reliability.
  • Document data engineering processes, data models, and data dictionaries.
  • Stay up-to-date with the latest advancements in data engineering, big data technologies, and machine learning.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer, with a focus on big data technologies.
  • Strong proficiency in programming languages such as Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks like Apache Spark for distributed processing.
  • Excellent problem-solving skills and the ability to work independently and as part of a team.
  • Strong communication and interpersonal skills.
  • Preferred: Master's degree in a related field.
  • Preferred: Experience with healthcare data and a good understanding of healthcare data standards (e.g., FHIR, HL7).
  • Preferred: Familiarity with machine learning concepts and LLM fine-tuning processes.
  • Preferred: Experience with data orchestration tools (e.g., Apache Airflow).
ClickHouse

Senior Software Engineer

ClickHouse
Seniorfull-time🇳🇱 Netherlands
Posted: 12 days agoSource: boards.greenhouse.io
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformGrafanaJavaKafkaPandasPySpark+5 more
ClickHouse

Senior Software Engineer

ClickHouse
Seniorfull-time$126k–$186k / year🇺🇸 United States
Posted: 13 days agoSource: boards.greenhouse.io
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformGrafanaJavaKafkaPandasPySpark+5 more
Trader Interactive

Software Engineer II

Trader Interactive
Mid · Seniorfull-time🏈 Anywhere in North America
Posted: 6 days agoSource: careers.traderinteractive.com
AirflowApacheAWSAzureCloudElasticSearchGoGoogle Cloud PlatformGrafanaJavaScriptLaravelPHP+1 more
GEICO

Senior Machine Learning Engineer – AI Agent Platform

GEICO
Seniorfull-time$115k–$230k / yearCalifornia, New York · 🇺🇸 United States
Posted: 3 days agoSource: geico.wd1.myworkdayjobs.com
AirflowAWSAzureCassandraCloudJavaKafkaKubernetesMongoDBOpen SourcePostgresPython+4 more
Q ENERGY

Data Scientist

Q ENERGY
Mid · Seniorfull-time🇫🇷 France
Posted: 21 days agoSource: apply.workable.com
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformPythonReactSQLTableau