That's simply not true. Firstly, "Label values may contain any Unicode characters" -
Secondly, whether a label value contains UTF-8 characters or only ASCII characters has no bearing as to whether it's "long" or not. Thirdly, the length of label values has minimal impact on disk space, since they are stored only once.
Therefore, if you've seen some behaviour that you didn't expect relating to length or character set of labels, can you show exactly what you did, and what you observed?