Hi Frank,
Depending on what the data in the dataset looks like, and its different preservation requirements, e.g. are we talking a large group of plain-text-like files, or images, video, and other complex types? Then you might find some success turning off tools like FITS in the Format Policy Register. We know that for each output that is switched-off there will be a reduction in the number of lines in the resulting METS.
Given you have a 2.2GB METS to look at you can inspect it to understand where the preservation output of other tools is limited for the data that you are working with. You probably don't want to do a huge amount of trial and error, so you might want to cut a lot out early and then see if that creates an AIP at least.
In terms of outputting that much data then it is something we have been looking at. We don't think there is a quick fix in code that will simply enable that much to be output easily. Not without some refactoring of the XML representation in memory in the module doing the heavy lifting. We have investigated different changes to the METS to reduce redundancy, e.g. removing unused optional PREMIS containers. That work is yet to result in a current project and still requires sponsorship.
It will be interesting to hear from others what they're doing in a similar situation and what they'd like to see to improve this. In the meantime, if reducing some of that tool output works for you, or playing with some of the other settings in the processing configuration, e.g. not documenting empty directories. Then it would be useful to hear how that goes on this forum, or on the ticket you linked to.
Best,
Ross