Stefanini Brasil

Senior Data Scientist

Stefanini Brasil

full-time

Posted on:

Location Type: Hybrid

Location: BrasíliaBrazil

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design and implement data pipelines to support the LLM customization process
  • Collect, process, and structure diverse data sources
  • Develop scripts and processes for extracting structured and unstructured data
  • Implement transformations to convert raw data into formats suitable for training
  • Ensure the quality, consistency, and relevance of the data used for training
  • Create mechanisms for validation and testing of datasets
  • Develop processes for data enrichment
  • Implement efficient storage for data and training results
  • Configure data integration between the trained model and the Elastic platform
  • Document data architecture, flows, and transformations
  • Implement data versioning and traceability practices
  • Optimize data flow for model training iterations
  • Ensure security and compliance in the handling of data used

Requirements

  • Additional courses in natural language processing or data preparation for ML (desirable)
  • Practical knowledge of the Elastic Stack platform (Elasticsearch, Logstash, Kibana) | Level: Advanced (Required)
  • Experience preparing datasets for training language models | Level: Advanced (Required)
  • Experience with extraction, transformation, and loading (ETL) of unstructured data | Level: Advanced (Required)
Benefits
  • Meal allowance or food voucher
  • Discounts on courses, universities, and language schools
  • Stefanini Academy — a platform with free, up-to-date online courses and certificates
  • Mentoring
  • Benefits club for consultations and medical exams
  • Health insurance
  • Dental insurance
  • Employee discounts and benefits at top establishments
  • Travel club
  • Pet care plan
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data pipelinesdata processingdata extractiondata transformationdata validationdata enrichmentdata storagedata versioningETLnatural language processing