Best way to filter time periods

0 views
Skip to first unread message

Peter H.

unread,
Jan 13, 2014, 6:58:13 PM1/13/14
to europe...@googlegroups.com
Hi,

I want to filter my API queries for specific time periods. So I tried several possible inputs:

[0500 TO 0750] gives me other results than [500 TO 750], but I think the first one is more accurate, is this true? I sometimes get years above 5000 with the last one.

Is it better to use "YEAR:" for more accurate results or would "when:", as an aggregated field, still give me some useful statistics? I think many providers won't have a field like "year" included in their data or would have a date specified in a different field, which could be why most of the time I get not much or no results out of this.

Essentially what I try to get out of this is a statistical value which I can compare from country to country, to see in which time period which country has the most results (as a percentage from all of their objects). Would you say the outcome is realistic with the current metadata?

Best regards,
Peter

Péter Király

unread,
Jan 16, 2014, 8:48:18 AM1/16/14
to europe...@googlegroups.com
Hi Peter,

it is very hard to answer your question. The year finally based on the
providers submitted records' dc:date field. The content of which is
not normalized, and in several times it contains hardly interpretable
values. Right now we have 30 million records, and less than 13 million
records has the YEAR field. So I would say, that your approach will
reflect the current state of the metadata, but won't be very precise
or accurate in terms of the real dates of the objects.

Hope it helps you a little.

Regards,
Péter


2014/1/14 Peter H. <peter...@googlemail.com>:
> --
> You received this message because you are subscribed to the Google Groups
> "Europeana API forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to europeanaAPI...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Péter Király
software developer

Europeana - http://europeana.eu
eXtensible Catalog - http://eXtensibleCatalog.org

Gordea Sergiu

unread,
Jan 17, 2014, 4:32:57 AM1/17/14
to europe...@googlegroups.com
Hi Peter X2,
:)

Well, by seeing this:
[0500 TO 0750] gives me other results than [500 TO 750],

I understood that the Year is a string value...
Indeed it is...
<field name="YEAR" type="string" indexed="true" stored="true"
multiValued="true" />

@Peter K
Shouldn't this be changed to Numeric value in order to support range queries correctly? I assume that the size of the index will be also slightly reduced.

I'm not sure if the usage of a custom Collator is a better solution, but .. it could be at least a quickfix that doesn't require to rebuild the whole index...

BR,

Sergiu
Reply all
Reply to author
Forward
0 new messages