NVIDIA NeMo Curator

21 views
Skip to first unread message

Jen English

unread,
Jun 25, 2024, 1:07:24 PM (8 days ago) Jun 25
to Common Crawl

Hi all -- Common Crawl users may be interested to check out NVIDIA NeMo Curator.  This GPU-accelerated data-curation library includes data download, document deduplication, language identification, filtering, and other features often requested by Common Crawl users. Helpful for preparing large-scale, high-quality datasets for pretraining and customization. Learn more here: https://github.com/NVIDIA/NeMo-Curator


--

Jen English

Program Manager, Common Crawl

Reply all
Reply to author
Forward
0 new messages