hi basile,
indeed the provenance data as it is currently recorded is extensive. i'm submitting a PR soon to change this so that it doesn't actually occupy memory except in short bursts (i'm a little surprised that this actually sticks around - i already thought we were loading results on the fly).
the provenance will be controlled via an execution flag and will be off by default for the next release. the workflow provenance cannot be generated with the graph submission engines easily and that issue needs to be addressed.
but reducing memory footprint is a good thing.