Maybe providing an initial proposal with details would help to understand the tradeoffs by having something concrete to look at. Would each load/store/fetch record the 128-bit capability plus the 1-bit tag, or just the 64-bit address plus the 1-bit tag? For the trace_entry_t format, would you take a bit from the type field, or you'd add a new field (increasing the size on disk) and have the reader interface hide the difference? For the memref_t format presented to tools, taking up more space has fewer downsides, so another field's only cost is compatibility. For offline raw entries, any extra space is quite costly as tracing is i/o bound. For that, are there unused bits in existing implementations for the type and tag, or is another 4 bits needed for a 3-bit type and 1-bit tag? That extra 4 bits may well translate to 50% extra overhead.