On Wednesday 29. January 2014 15.05.14 Richard Cyganiak wrote:
> Less is probably more there. Unless you have a very concrete need for the
> more complex constructs there (e.g., you have a federation framework that
> requires exactly those statistics), then I'd recommend sticking to the
> simplest constructs. If there is a particular number you want to include
> that cannot be expressed with a simple VoID property, it may be better to
> introduce a new property.
>
> I say this because the more complex constructs (e.g., clever stuff with
> class and property partitions) tend to go unused and can be misleading.
So, just a quick note from me too, as I'm doing some clever data profiling stuff
for my ph.d. ;-) Most of the proposed statistics here is useful for
federation, as shown by Olaf Görlitz et al in their SPLENDID paper. However,
as I'm computing it in my code, I can only note that it is pretty heavy to
compute, and indeed, it is quite unlikely that people will do it unless the
data providers have a very compelling reason to do it.
I've seen that in the last few days, Philip Stutz have been implementing
cardinality caching in their Triplerush triple store. That's one case where it
is likely that such statistics can be provided, since it becomes much more
affordable to do. See
https://github.com/uzh/triplerush
Another case where it is likely to exist is when the statistics is used for
internal optimizations.
For all others, I think the key is to argue for *why* a certain piece of
information is important to expose, keeping in mind that it is possibly
demanding to produce. Just an IG recommendation is unlikely to suffice, I
suspect, it would have to be on the form "to enable $foo, expose $bar".
Cheers,
Kjetil