Multimodal Extraction: Apply state-of-the-art tools (OCR, vision-language models, document understanding frameworks) to interpret diverse input types;
Prompt Engineering: Develop and refine strategies for using LLMs to extract, summarize, and transform unstructured content into structured formats;
Data Quality & Structuring: Clean, validate, and transform messy, unstructured data into well-defined schemas ready for use in training or analytics pipelines;
Content Filtering: Define standards and build systems for cleaning, validating, and filtering data to ensure accuracy, reduce bias, and align with ethical/safety guidelines;
Human-in-the-Loop Feedback: Design feedback loops where experts validate or enrich data, improving LLM-based extraction reliability;
Scalability & Optimization: Architect cost-efficient, high-throughput data pipelines that are robust to noisy or incomplete sources;
Research & Prototyping: Experiment with emerging tools and methods in the LLM + multimodal space, exploring new ways to enhance information coverage and extraction reliability;
Collaboration: Partner with data engineers and other data scientists to integrate collected data into larger AI and analytics systems;
Live the mission: inspire and empower others by genuinely caring for your own wellbeing and your colleagues. Bring wellbeing to the forefront of work, and create a supportive environment where everyone feels comfortable taking care of themselves, taking time off, and finding work-life balance.
Requirements
Master’s degree (or PhD) in Computer Science, Data Science, Machine Learning, Statistics, or a related field;
Proficiency in Python and experience with libraries for web scraping, OCR (e.g., Tesseract, EasyOCR), and NLP (e.g., HuggingFace Transformers);
Deep understanding of LLM capabilities in multimodal and extraction contexts, including prompt engineering and few-shot learning;
Strong background in unstructured data processing: APIs, web scraping, HTML parsing, OCR, image/document analysis;
Strong analytical problem-solving skills, with a track record of turning noisy data into high-quality datasets for ML;
Excellent communication and documentation skills, with the ability to influence across technical and product teams.
Preferred Qualifications:
Familiarity with visual-language models (e.g., BLIP, Donut, or LayoutLM) and multimodal pipelines;
Hands-on experience in prompt engineering and few-shot learning for data extraction tasks;
Experience deploying or supporting data acquisition systems in production environments.
Benefits
WELLHUB: We believe in our mission and encourage our employees and their families to take care of their wellbeing too. Access onsite gyms and fitness studios, digital fitness programs, and online wellness resources for meditation, nutrition, mental health support, and more. You will receive the Gold plan at no cost, and other premium plans will be significantly discounted.
WELLNESS: Health, dental, and life insurance.
FLEXIBLE WORK: At Wellhub, flexibility fosters a happier, healthier, and more productive work environment for everyone. As a Flexible First company, we offer two work model options: flexible hybrid and full remote, and make the office a place for collaboration, community, and team building. The model for this role can be discussed with your recruiter and hiring manager. We offer all employees a home office stipend and a monthly flexible work allowance to help cover the costs of working from home.
FLEXIBLE SCHEDULE: Wellhubbers and their leaders can make the best decisions for their scope. This includes flexibility to adjust their working hours based on their personal schedule, time zone, and business needs.
PAID TIME OFF: We know how important it is that our employees take time away from work to recharge. Vacations after 6 months and 3 days off per year + 1 day off for each year of tenure (up to 5 additional days) + extra day off for your birthday.
PAID PARENTAL LEAVE: Welcoming a new child is one of the most special moments in your life and we want our employees to take the time to be present and enjoy their growing family. We offer 100% paid parental leave to all new parents and extended maternity leave.
CAREER GROWTH: Outstanding opportunities for personal and career growth. That means we maintain a growth mindset in everything we do and invest deeply in employee development.
CULTURE: An exciting and supportive atmosphere with ambitious people from around the world! You’ll partner with global colleagues and share in the success of a high-growth technology company disrupting the health and wellness space. Our value-based culture of trust, flexibility, and integrity makes this possible every day.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonOCRweb scrapingNLPprompt engineeringfew-shot learningunstructured data processingdata validationdata transformationdata pipelines