Hi,
we have a perl script we use for submitting, its origins are somewhat
unclear but I think it's purely custom. It does split uploads in a way
that is good enough for us (and during uploads, we tend to be mostly
io-bound for postgres). But again, we can live with the time for that.
And this dataset seems to need just south of 90 Gbyte heap. I have
obviously not tried exactly every lower value, but IIRC e.g. a 70
Gbytes will lead to either OutOfMemoryError due to lack of heap space
or GC overhead limit exceeded. We've also gone through a number of
iterations moving to larger machines (being more cumbersome to
allocate for testing), so I'm quite confident this is the actual
memory requirement for this specific dataset (but I also believe we
may be hitting on a weird case).
/Pontus