Grupo Protege

Senior Data Scientist

Grupo Protege

full-time

Posted on:

Location Type: Remote

Location: Brazil

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
  • Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups
  • Provide leadership on data quality strategy and shape internal best practices
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards
  • Contribute to research and development of tools that automate data preprocessing and validation

Requirements

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field
  • Strong understanding of AI model training pipelines, including pre-processing and evaluation
  • Experience working with large, unstructured datasets, especially text
  • Background in statistical analysis, bias detection, and data validation
  • Able to identify high-impact problems and drive independent solutions
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
statistical methodsmachine learningdata preprocessingdata validationstatistical analysisbias detectiondata diversity assessmentdata duplication assessmentdata informativeness assessmentAI model training pipelines
Soft Skills
collaborationleadershipproblem identificationindependent solutions
Certifications
PhDMaster's Degree