It seems that this high cardinality issue is OpenTSDB's Achilles heel. If I have a metric with a large number of tags, then querying it becomes very slow. I'm trying to work out why, obviously that is the first step to solving a problem. My understanding of the issue is below, please let me know if this is incorrect. This is assuming I have a metric with a single tag of username
So the row key will key "metric_name","time to nearest hour", "UserNameKey", "UserNameValue"
Because OpenTSDB doesn't know that every row contains only a tag of Username, it can't use that part of the row key in a get or scan request. So it simply does a scan based on the start and end time and filters the returned data manually. Hence it needs to look through all the data in the time range for that metric.
Is my understanding correct?