Long story short, due to an issue with our hbase, the UID mappings went missing for a period of time and the metrics ended up being recreated (and apparently all tagk and tagv mappings to boot), so now we have duplicate UID mappings of everything. Data that is mapped to the new UIDs is available, however data mapped to the old UIDs is not. The question is, is there a way to;
a) extract the data that's mapped to the old UIDs so it can be re-imported to the new UIDs or even better re-mapped
b) remove the old mappings afterwards
I'm not particularly concerned with the loss of the UIDs themselves, while it's a fair number of UIDs that would become unavailable, it's not a significant enough number to be of concern, however the data gap... caused by the problem, that's a real issue.
I did a scan of the tsdb-uid table (redirected to a file) and an example is below
grep net.circuit.ifhcinoctets screenlog.18
;\x7F\x15 column=name:metrics, timestamp=1506097504076, value=net.circuit.ifhcinoctets
net.circuit.ifhcinoctets column=id:metrics, timestamp=1506097504079, value=;\x7F\x15
net.circuit.ifhcinoctets column=id:metrics, timestamp=1507157614607, value=\x9A\xFF\x9D
\x9A\xFF\x9D column=name:metrics, timestamp=1507157614604, value=net.circuit.ifhcinoctets
you can see where the old mapping is, and what the new mapping is, there's about 10 days of data that's mapped to the old UID and if we do a tsdb uid fsck we can see the following output
2017-10-06 08:34:36,287 ERROR [main] UidManager: Duplicate forward metrics mapping: net.circuit.ifhcinoctets -> 3B7F15 and net.circuit.ifhcinoctets -> 9AFF9D. kv=KeyValue(key="net.circuit.ifhcinoctets", family="id", qualifier="metrics", value=[-102, -1, -99], timestamp=1507157614607)
2017-10-06 08:34:36,751 ERROR [main] UidManager: Inconsistent reverse metrics mapping 3B7F15 -> net.circuit.ifhcinoctets vs 9AFF9D -> net.circuit.ifhcinoctets / net.circuit.ifhcinoctets -> 9AFF9D
as you can imagine, this is ... not a happy situation and I'm looking for potential resolution that doesn't involve wiping the cluster (not an option) and/or major data loss.