For DR2, we just downloaded each of the CSV files, converted each one to FITS, and then fed those to hpsplit -- no need for mongo or to create a single giant csv then a single giant FITS :) But whatever works!
One issue: Gaia-EDR3 (like previous releases) is not complete at the bright end (around 11th-12th mag). For the 5200-series, I merged Tycho-2 and Gaia (carefully) to create a more complete input catalog before hpsplit and indexing.
Regarding the 'invalid FITS' complaint -- yes, we abuse the FITS convention, storing binary data as strings, and apparently fitsverify doesn't like non-ASCII strings. Presumably we could tell it we are storing uint8 and it would not complain.
cheers,
--dustin