Dear Hannes,
we do precompute the token coverage and store it in the STRUCTURE.ATTRIBUTE.token files, or STRUCTURE.ATTRIBUTE.norm using the mknorms script.
Current manatee stores raw token counts in .token files, while for .norm files only words are considered (word is a token _not_ matching NONWORDRE from corpus configuration, which is [^[:alpha:]].* by default). Older versions use .norm files only, and it is not obvious whether they contain token or word counts.
The counts are calculated by mknorms, a .token file is generated by setting NORM_STRUCTATTR to "-", .norm file is generated when NORM_STRUCTATTR is set to a name of an attribute of the structure, which contains the count of words for each structure ocurrence encoded as string. We generate that through compilecorp and addwcattr. The files themselves store the counts as 8 byte little-endian integers, one for each structure attribute id.
You can get the precomputed values through API by setting the wlsort parameter of the wordlist endpoint to "token:l" for token counts per structure attribute value and "norm:l" for word counts per structure attribute value, so
wordlist&wlattr=doc.year&wlsort=token:l
or
wordlist&wlattr=doc.year&wlsort=norm:l
in your case. You can access the same information from the "text type analysis" screen accessible from the corpus info page in the Web interface.
The wlnums parameter can also be used in the same way to get these quantities at the same time along the "primary" wlsort value.
When you call the wordlist endpoint without specifying this, the counts are calculated online in a generic (=seek-intensive) way across all positions in the corpus, so the calculation is not cheap.
Best regards,
Ondrej