search in WIS

13 views

Skip to first unread message

Timo Proescholdt

unread,

Jul 3, 2012, 4:35:54 AM7/3/12

to et-wisc, wmo-i...@googlegroups.com, David Thomas, Peiliang Shi, Dellacqua , Matteo, Hiroyuki Ichijo, wissync

Dear colleagues,

the issue of search in WIS requires additional attention, as we go
about incrementally improving the system. Currently non-domain
experts, and even experts, struggle to find information. It goes
without saying that this situation is not tenable.

I would like to share what I think are some important points, without
claiming they are complete, or pertinent. They all evolve around
making the large result-sets that are typical for WIS, easier to deal
with.

First, I would like to propose to make search results more accessible.
We need to explain better what data/products a user is seeing in the
result, as it is currently not obvious, especially for non-domain
experts. This would involve using the knowledge one has of the
metadata to classify the result set into subsets. The WIS metadata
contains a substantial amount of information that is currently unused.
(filetype, isSYNOP, is BUFFR, origin, provider, etc.).
A pie -chart could be shown in the beginning, showing the distribution
of different subsets in the result.
Each result could have "tags", indicating that this corresponds to a
"BUFR, SYNOP, .. etc." and including a description of what a
BUFR,SYNOP etc. is (when one hovers over the symbol?), including a
link to more info about the fileformat. This makes it orders of
magnitude easier to understand what one has found, especially for
non-domain experts. It also lays the foundation for a more effective
browsing. The tags could not only include the filetype, but other
information, such as what the origin is of the data, how often it is
accessed, whether it is about meteorology, oceanography, climatology,
if it is in the GTS or not and so on.
Disadvantages of this are the high processing cost, since a whole
result set rather than only the current sliding-window (pagination)
needs to be analyzed. The subsets I spoke of are also not necessarily
mutually exclusive, making drawing pie-charts more difficult.

Second, a user should have the means to narrow down a search result
according to preferences. This will help with coping with the large
result sets. In my experience, it is unlikely that a user hits the
desired data/product with the first search, or on the first couple of
pages, so browsing will be essential. Having the classification (tags)
I described above, will allow a user to quickly remove the results of
this kind from the result set, after having evaluated and concluded
that she is not looking for this kind of stuff. One could imagine a
button “remove these kinds of stuff from result set” or “remove stuff
similar to this from result set” or even “show only stuff that is
similar to this”.

Identifying the subsets (tags) I speak of is relatively easy with know
data (e.g WMO GTS data/products). It is more difficult for the many
other records that will be integrated into WIS from DCPCs. A mixture
of static and dynamic classification could help here. Regarding static
classification, good metadata that is uniform and standardized within
the particular communities will make it easier to identify a
particular record as belonging to this community. For this, it is
important to have templates and common keyword lists per community.
However, not all subsets can be identified statically (ergo by looking
a predefined fields in the metadata). Here, a technique called
clustering might help to find subsets of common data in a large result
set.

The currently prevailing search technique is one where tables or
indices are searched for search terms and the (usually very large)
result is dumped to the user for examination. Techniques such as the
ones described above are different, since they involve analyzing the
whole of a search result (thus potentially a large set) in order to
classify it. More processing power and above all a conceptual model
that allows regrouping of individual records are thus needed.
Knowledge how to do this might not be readily available in a
metservice. One would thus have to look for partnerships, e.g with
universities or companies (google, amazon) and to try and learn which
of their domain knowledge is applicable to WIS. It might certainly
make sense to pool such experience between GISCs. A workshop on search
might bring GISCs and other players (search companies, academia)
together.

I hope that these lines are some food for thought and that we can have
a discussion on how to improve the search quality in WIS.

Best regards
Timo

--
Timo Pröscholdt
Program Officer, WMO Information System (WIS)
Observing and Information Systems Department
World Meteorological Organization
Tel: +41 22 730 81 76
Cell: +41 77 40 63 554
e-mail: tproes...@wmo.int

------------------------------------------------------------------------------
The information contained in this electronic message and any attachments are intended for specific individuals or entities, and may be confidential, proprietary or privileged. If you are not the intended recipient, please notify the sender immediately, delete this message and do not disclose, distribute or copy it to any third party or otherwise use this message. The content of this message does not necessarily reflect the official position of the World Meteorological Organization (WMO) unless specifically stated. Electronic messages are not secure or error free and may contain viruses or may be delayed, and the sender is not liable for any of these occurrences.
------------------------------------------------------------------------------
SAVE PAPER - Please do not print this e-mail unless absolutely necessary

Reply all

Reply to author

Forward

0 new messages