Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
MLabs

Research Crawling Engineer

MLabs

. Construct and maintain large-scale web crawlers across diverse domains.

Posted 4/28/2026full-timeRemote • New York • 🇺🇸 United StatesMid-LevelSenior💰 $80,000 - $175,000 per yearWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsGoJavaJavaScriptPuppeteerPythonRust

About the role

Key responsibilities & impact
  • Construct and maintain large-scale web crawlers across diverse domains.
  • Design high-throughput, fault-tolerant systems for data collection, managing volumes ranging from millions to billions of URLs per day.
  • Navigate anti-bot systems, rate limits, and dynamic, JavaScript-heavy websites.
  • Develop robust pipelines for data cleaning, deduplication, filtering, and normalization.
  • Build and maintain datasets specifically structured for research and machine learning model training.
  • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration.
  • Collaborate with research teams to ensure data collection efforts align with modeling requirements.
  • Optimize infrastructure to ensure cost-efficiency, low latency, and reliability.

Requirements

What you’ll need
  • Extensive programming experience in one or more of the following: Go, Rust, Python, Java, or C++.
  • Proven experience in building web crawlers or large-scale data pipelines.
  • Solid understanding of HTTP, networking protocols, and browser behavior.
  • Familiarity with distributed systems and parallel processing techniques.
  • Experience handling large datasets, ideally at the terabyte to petabyte scale.
  • Demonstrated ability to debug and maintain systems within unstable or adversarial environments.
  • Preferred Qualifications:
  • Experience with NLP pipelines or dataset curation for machine learning.
  • Familiarity with LLM pre-training data or retrieval systems.
  • Practical experience with headless browsers (e.g., Playwright, Puppeteer, or Chrome DevTools Protocol).
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration.
  • Background in data quality evaluation or benchmarking.
  • Experience running workloads on cloud or bare-metal infrastructure.

Benefits

Comp & perks
  • Impactful Opportunity: Contribute to the development of a web-scale crawler and knowledge graph at the forefront of AI data accessibility.
  • High-Performance Culture: Join a lean, low-ego team that prioritizes high output and professional growth.
  • Remote Work: This position is part of a fully remote team, offering flexibility and autonomy.
  • Competitive Compensation: A package including a competitive salary, comprehensive benefits, and equity, commensurate with experience and the ability to operate at scale.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GoRustPythonJavaC++HTTPnetworking protocolsdistributed systemsparallel processingdata cleaning
Soft Skills
collaborationdebuggingproblem-solvingadaptability