
Senior Data Scientist
Stefanini Brasil
full-time
Posted on:
Location Type: Hybrid
Location: Brasília • Brazil
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Design and implement data pipelines to support the LLM customization process
- Collect, process, and structure diverse data sources
- Develop scripts and processes for extracting structured and unstructured data
- Implement transformations to convert raw data into formats suitable for training
- Ensure the quality, consistency, and relevance of the data used for training
- Create mechanisms for validation and testing of datasets
- Develop processes for data enrichment
- Implement efficient storage for data and training results
- Configure data integration between the trained model and the Elastic platform
- Document data architecture, flows, and transformations
- Implement data versioning and traceability practices
- Optimize data flow for model training iterations
- Ensure security and compliance in the handling of data used
Requirements
- Additional courses in natural language processing or data preparation for ML (desirable)
- Practical knowledge of the Elastic Stack platform (Elasticsearch, Logstash, Kibana) | Level: Advanced (Required)
- Experience preparing datasets for training language models | Level: Advanced (Required)
- Experience with extraction, transformation, and loading (ETL) of unstructured data | Level: Advanced (Required)
Benefits
- Meal allowance or food voucher
- Discounts on courses, universities, and language schools
- Stefanini Academy — a platform with free, up-to-date online courses and certificates
- Mentoring
- Benefits club for consultations and medical exams
- Health insurance
- Dental insurance
- Employee discounts and benefits at top establishments
- Travel club
- Pet care plan
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data pipelinesdata processingdata extractiondata transformationdata validationdata enrichmentdata storagedata versioningETLnatural language processing