Summary:
1) Dutch resources added to HeidelTime
2) Bug fix in HeidelTime standalone version (handling of different encodings)
3) Modifier information for temporal expressions now available
4) Some small fixes in HeidelTime kit
5) Information for Windows users (HeidelTime standalone)
6) Download links
-------------------------------------------------------------------------------------
Dear HeidelTime-Users,
1) Dutch resources added to HeidelTime
HeidelTime knows a new language! The new version contains resources
for Dutch, which were developed and kindly provided by Matje van de
Camp (Tilburg University).
Note that you need to download and install the Dutch resources for the
TreeTagger as well
(http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/).
Since the UIMA preprocessing components of the UIMA HeidelTime kit do
not handle Dutch so far, you can use the Dutch resources with the
standalone version, only. However, we are working on the UIMA
preprocessing components to process Dutch as well.
If you want to try the Dutch resources, they are also integrated in
the online version:
http://dbs-projects.ifi.uni-heidelberg.de/heideltime/
If you developed resources for an additional language and want to
share them, please contact us. We are happy to help making them
available.
2) Bug fix in HeidelTime standalone version (handling of different encodings)
A couple of users pointed us to some encoding problems when using
HeidelTime's standalone version. We have now (hopefully) solved this
issue in the following way:
- HeidelTime standalone requires as encoding of the Java Virtual
Machine UTF-8. If the default on your machine is not UTF-8, then you
have set the encoding to UTF-8 using the "-Dfile.encoding" option:
java -Dfile.encoding=UTF-8 de.unihd.dbs.heideltime.standalone.jar
<file> <options>
- If you want to process documents not encoded in UTF-8, you can set
the encoding using the new "-e" parameter:
"java -Dfile.encoding=UTF-8 -jar
de.unihd.dbs.heideltime.standalone.jar <file> -e <enc> <further
options>"
with <enc> being, e.g., "UTF-8" or "ISO-8859-1"
- Note, however, that the encoding of the output is always UTF-8
Further details on the encoding are given in the Manual.
3) Modifier information for temporal expressions now available
In the new version (standalone and UIMA HeidelTime kit), we have
further fixed a bug that no MOD information for any temporal
expression was shown. Now, the modifier information is available,
e.g., expressions such as "the beginning of 2001" have the
mod-attribute "START" in addition to their value attribute "2001".
4) Some small fixes in HeidelTime kit
In addition, we have fixed a couple of smaller bugs in the (UIMA)
HeidelTime-kit, mainly in the normalization resources.
5) Information for Windows users (HeidelTime standalone)
If you are using HeidelTime standalone on Windows, you have to
download and install the following TreeTagger scripts and resources,
additionally:
- utf8-tokenize.perl
- german-abbreviations-utf8
- dutch-abbreviations
For this, download and extract the TreeTagger tagging scripts
(ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tagger-scripts.tar.gz - be
aware that this file should be extracted in an empty folder). Then,
copy the following files to your TreeTagger folders:
copy cmd/utf8-tokenize.perl to TREETAGGER_HOME/cmd/
copy lib/german-abbreviations-utf8 to TREETAGGER_HOME/lib/
copy lib/dutch-abbreviations to TREETAGGER_HOME/lib/
6) Download links
The new versions are now available at:
http://wega.ifi.uni-heidelberg.de/temporal_tagging/heideltime-standalone.tar.gz
http://wega.ifi.uni-heidelberg.de/temporal_tagging/heideltime-standalone.zip
http://wega.ifi.uni-heidelberg.de/temporal_tagging/heideltime-kit.tar.gz
http://wega.ifi.uni-heidelberg.de/temporal_tagging/heideltime-kit.zip
Alternatively, you may download the new version following the download
links and registration form on:
http://dbs.ifi.uni-heidelberg.de/heideltime
If you are running into any problems using HeidelTime or receive
"strange results", please let us know. We will then try to fix these
problems.
Furthermore, any kind of feedback is highly appreciated.
**Special thanks to Matje van de Camp, Lars Döhling, and Leon
Dercynski for their help to fix the problems.
Thank you very much and kind regards,
Jannik