Hi David,
yes, the crawler is run from Northern Virginia (AWS region us-east-1).
And yes, it's true, the crawler may see different content when run
from a different location or via a proxy.
Because we use short-lived spot instances and there's a shuffle step
between the fetching and the WARC writing, it does not really make sense
to add the IP address to the warcinfo record. The IP adress of the WARC
writer task is not that of the fetcher tasks.
Best,
Sebastian