Problem loading texts

4 views
Skip to first unread message

Javier Arróspide

unread,
May 28, 2008, 3:06:52 PM5/28/08
to WordSmi...@googlegroups.com
Hi there,

I am having a problem to load texts from my corpus for the concordance
search. My corpus comprises 33 texts, but WST only loads correctly 2 of
them. For the rest, it only recognizes less than 10 words (when the
texts are considerably larger).

I have noticed that, as a matter of fact, the two texts that have been
correctly loaded are in Unicode, whereas the rest are marked with A. I
don't know if this has anything to do with the problem I am experiencing.

I would really appreciate it if anyone could help me with this problem

Thanks in advance,
Javi

mi...@lexically.net

unread,
May 30, 2008, 3:50:08 AM5/30/08
to WordSmith Tools
Javi, hi

Not sure which version of WS you're using. WS4 (and early WS5) does
not have good procedures for detecting UTF8, which may well be the
cause of your problem
UTF8 texts have most characters using 1 byte but some characters use
more than 1 byte.
First suggestion would be to convert all your text to Unicode (UTF16)
using the Text Converter or even using MS Word.
Alternatively you might try WS5's Corpus Corruption finder, to see if
there is anything odd about your corpus, but I doubt that's necessary
as you have only 33 texts.
Cheers -- Mike

Javi

unread,
May 31, 2008, 4:49:38 AM5/31/08
to WordSmith Tools
Hi Mike,

Thanks a lot, I´ve just converted the texts to Unicode and everything
works perfectly now.

Best,
Javi
Reply all
Reply to author
Forward
0 new messages