Many thanks for taking this on, Sebastian!
My effort:
*CRESTMarketTrawler*
What it does: Constantly cycles through each region in turn in a random but fixed order, and uploads the entire market to EMDR in chunks of 30,000 orders (to avoid breaking the upload size ceiling). (Also dumps the data into a local database, which is the limiting factor in its speed.)
What it does not: Historic data, nor lookups to populate the solarSystemID field (which is left null).
Caveats: Possible inconsistency between CREST pages; doesn't (yet) chunk per type so the prior invariant of a single message containing all orders for a (typeID, regionID) pair is invalidated.
I want to expand a little on that last: with the cache-scraping approach, then you were guaranteed to have all market orders for a given type in a given region in a single message (assuming you weren't doing something crazy with your uploads). I suspect that a lot of consumers will do something like:
onOrdersReceived(typeID, regionID):
database.exec("DELETE FROM orders WHERE typeID=%s AND regionID=%s", typeID, regionID)
...etc
Without that invariant holding, I don't see how one can know to expire older (fulfilled, not expired) orders from the database. I'm planning to change my trawler to maintain this invariant in the future.