In my work with semantic relations, I have used a corpus of 500 GB of
text, combined with a passage retrieval search engine that gives exact
hit counts (the Waterloo MultiText System). Participants in Task 4 are
welcome to have remote access to this resource. Please contact me if
you are interested.
Some other resources are described here:
http://www.apperceptual.com/lra_tools.html
Peter.
It would take a few weeks to download 500 GB of raw data from across
the Atlantic. (It took us a couple of weeks to copy it from Waterloo to
Ottawa.) The WMTS query syntax is quite powerful, so most of the things
you might want to do with the data can be done through the WMTS. If you
really want the raw data, your best option is to purchase the GOV2
corpus, which will be shipped to you on hard drives:
http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm
http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
Peter.
If you just want some random samples of the raw data, you can do this
through the WMTS.
Peter.
Hi Peter, all
is it still possible to have remote access to your resource?
We would like to try adding to our kernel based system this kind of
features.
thanks a lot
Lorenza
Lorenza Romano
Fondazione Bruno Kessler - IRST
Centre for Scientific and Technological Research
via Sommarive, 18
38050 Povo (Trento) - Italy
http://www.itc.it/
http://tcc.itc.it/
------------------
ITC -> dall'1 marzo 2007 Fondazione Bruno Kessler
ITC -> since 1 March 2007 Fondazione Bruno Kessler
------------------
Sorry for the delayed reply. I was on vacation in Egypt with my family
for two weeks.
If it is not too late, please send me an email, and I will set up an
account for you.
Best wishes,
Peter.