Hello guys! I have read the info on
https://groups.google.com/g/common-crawl/c/iZVW5ai9jQI, which talks about what I am trying to achieve, however the warc file I need to partially download, I am not finding these parameters.
I am working on a personal project based off the My Opera archive project.
Particularly, from this warc:
I have found a few blogs on the .cdx file that I would like to extract.
I googled and tried with curl and warcat, but didn't have any success.
The format on the .cdx file is the following:
Sorry for all the questions. I'm a warc noob and been googling around all day but kinda got stuck in here.
Thanks for any help and have a nice day!