For my NLP purpose I just need specific text belonging to mostly just one URL within the whole WET file. Since loading single files with gzip in python takes long time to load it in memory, it is a but unsatisfactory. So I have several thousands of wet files where I just need to grap one text for a URL.
I wonder why requesting a wet file doesn't send back a json where you can pick just with warc-uri?
Or is this possible? I am just new to common crawl
Thanks!