Vittorio,
Indeed, cdx_toolkit has all of the code to do that, but it's not
hooked up on the command line. It's been on my TODO list for about a
year now.
If you can do a little Python, the code you want to call is
cdx_toolkit.warc.fetch_warc_record(). That gets you the record, then
you'll also have to set up a CDXToolkitWARCWriter and call
CDXToolkitWARCWriter.write_record() for each of your 100k rows. It's
not much code, but of course if you're not already familiar with
cdx_toolkit development, it will take a while.
-- greg
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
common-crawl...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/common-crawl/fc93d819-db51-4aca-bc40-591d9e980c62n%40googlegroups.com.