Richard Eckart de Castilho
unread,Dec 2, 2014, 12:51:32 PM12/2/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Tatjana Scheffler, webann...@googlegroups.com, Prof. Dr. Chris Biemann
On 02.12.2014, at 18:26, Tatjana Scheffler <
tsche...@gmail.com> wrote:
> In particular, the plain text import for documents seems to do some kind of automatic segmentation. After each period, the sentence is split into a new line. On the other hand, line breaks in the text file are ignored if they don't end with a period. I would like to preserve the new lines from the original document. Is this possible?
WebAnno displays text according to sentence boundaries, not according to line breaks. Lines are wrapped automatically in the web interface.
When you import plain text, a basic heuristic is used to segment the text into sentences, and line breaks are largely (if not completely) ignored by this heuristic.
If you want WebAnno to render your text based on line breaks, then you need to explicitly mark these as sentence breaks. This means, you have to use another data format for import, e.g. TCF or TSV.
Consider the text
Sie sind so sehr vermessen /
Weil sie des Tods verges=
sen .
To make WebAnno respect the line breaks, you would have to render it like this in the TSV format:
1-1 Sie
1-2 sind
1-3 so
1-4 sehr
1-5 vermessen
1-6 /
2-1 Weil
2-2 sie
2-3 des
2-4 Tods
2-5 verges=
3-1 sen
3-2 .
First column is <lineId>-<tokenId> (<lineId> is actually <sentenceId>!).
Second column is the token text.
Cheerio,
-- Richard