Hi,
This category distribution is something we took into account while collecting images and captions for the entire dataset to ensure that we're collecting the right topics targeting our task. There are no separate category labels for train/val/test splits.
If you want to use something like this for your method, you are allowed to compute it.