I'm new to stream-lib. I understand that with StreamSummary I can retrieve the items, the count and error of each item. Count and error are maintained by StreamSummary. I'd like to store some additional information with each item, say, a second numeric field to aggregate on. Is there a way to extend/enhance StreamSummary so that the additional information of each key is also maintained? This is against the whole idea of the space-saving hashes approach but I thought I'd still ask in case I'm missing anything.
I could maintain an external map to maintain the extra information, add a key to it when StreamSummary.offerReturnAll() reports a key is added, drop a key when StreamSummary reports that a key is dropped. However, when a key is dropped, StreamSummary can restore that key with an estimated count but the external map will not contain any historical information about that key (previous adds and drops). So this isn't really a solution.
I suppose an improvement to the external map idea is to use a much larger capacity, say, 10*k or 100*K, hoping that the keys don't get dropped that often. The longer the final top K keys stay in the buckets w/o getting dropped, the more accurate the additional information is in the external map. Could this work?
Thanks,
Jack