(Apologies for cross-posting, but I think this is of interest to both
mailing lists.)
Exhibit has been built to ease the process of publishing (structured)
data to the Web. The immediate benefit to exhibit authors is, of course,
that there's no need to learn database technologies, set up web servers,
design browsing interfaces, and build full 3-tier web applications that
work across several browsers. This benefit in itself is already worthwhile.
However, there is a secondary, long-term benefit, not to exhibit authors
but to potentially everyone. The data published through Exhibit is
readily machine-processible. This means that search engines (like
Google) can take advantage of this data and provide better Web-wide
search interfaces beyond simple keyword text search (e.g., "Which
Republican presidents came into office after the age of 50 and had
served in the XYZ war?"). To this end, I have made a quick attempt at
crawling the (publicly accessible) exhibits and showing their data in a
Backstage installation:
http://people.csail.mit.edu/dfhuynh/misc/backstage-demo-the-ex.html
(demo kept live until end of Tuesday)
Some statistics:
< 1.5 years of data after Exhibit 1.0 launch in December 2006
2,800+ exhibits (publicly accessible and already in Google's or
Yahoo's index)
800+ item types
5,800+ properties
136,000+ items
There are a lot of duplicates because I have not made an earnest effort
in resolving them.
Much more can be done to improve "The Exhibition". The lens templates in
the original exhibits can be extracted so that items in the Exhibition
can be shown in their original lenses. Also, the settings for map views
and timeline views in the original exhibits can be extracted, so that
maps and timelines can be constructed automatically in the Exhibition.
Faceted browsing can also be offered automatically on search results.
Recently, there have been many efforts in scraping (structured) data out
of the Web and the Deep Web, so that in the future, the Web can be
accessed like a single giant database, yielding more precise search
results to complex queries (e.g., "Who were the Republican presidents
who came into office after the age of 60 and had served in the XYZ
war?"). However, these scraping efforts target only on large web sites
(e.g., Wikipedia). But these large web sites do not represent the whole
Web--there are a huge number of smaller web sites that, when
accumulated, can rival the amount of data in the large web sites.
Exhibit and The Exhibition together are an effort to give these smaller
web sites, and their authors, _representation_ on the future "Data Web",
ensuring diversity beyond mainstream information found in large
commercial and institutional web sites.
Here are a few interesting exhibits in "the Long Tail" of the Web:
Florida's double dippers:
http://www.tampabay.com/specials/2008/interactives/retirement-loophole/
Gestures: http://gestureproject.com/
Rocks: http://www.teuntostring.net/hunebedden/hunebedden.html
Language structures: http://wals.info/feature
Resume: http://people.ucsc.edu/~weissman/CVExhibit.html
Ancient coin mints:
http://www.ancient-world-coins.com/coin_mints/exhibit
I hope these examples spawn some ideas for more data sets that should be
in the future "Data Web".
Cheers,
David