correct character endcoding (German) for LexisNexis Export (Mac-User)

108 views
Skip to first unread message

Janine Wolf-Schindler

unread,
Jan 29, 2015, 4:35:50 AM1/29/15
to ant...@googlegroups.com
Dear all,

I am desperate - I am trying to upload my corpus from LexisNexis.

I started with transforming the .trf-file into txt , which is, I know that now - the wrong thing to do. What looks nice in .trf: looks horrible in the preview of mac, thus it doesn't have the correct encoding for AntConc.

I switched to html. I got, at last, the "correct" utf 8-encoding for the Umlaute (ä,ö,ü,): religiöse 
This is better, yet it is not correct.
I changed the settings to ISO 8859-1 for both the program and the html-file. But it is still not working correctly. Doesn't change anything.

Now I don't know what to do (I also showed it so my husband who is a programmer, he tried all stuff again, without success).

I don't even dare to try the French file.

Can anyone help?
thanks :-)



   

Laurence Anthony

unread,
Jan 29, 2015, 5:47:15 AM1/29/15
to ant...@googlegroups.com
Hi Janine,

AntConc works perfectly with LexisNexis data in German, French, Japanese, Chinese, and any other language.

It seems that you are getting all your encodings mixed up. Html, txt, xlm have *nothing* to do with encodings. They are all text-based files that can be encoded in a multiple of encodings.

The simple rule for AntConc is *set the encoding of your files in the AntConc global settings*. If you know what your files are encoded in, this is straightforward. AntConc defaults to UTF-8 (the international standard), so if you save all your files in UTF-8 from the outset, then you never have to think about encodings again.

You wrote that you saved your files in UTF-8 and then changed the settings to ISO-8859-1, which just doesn't make sense. Stay in UTF-8. (And, you cannot just "change the setting" of an HTML file, anyway. You can convert an HTML file from one encoding to another, but it is not a 'setting'.)

To help matters, I have created a program called EncodeAnt that auto-detects file encodings (so you can then set that in the AntConc global settings) and also does batch auto-conversion to UTF-8, so you can just use the converted files as is. There is a good chance that you have mixed up all your encodings and have some in ASCII, some in ISO-8859-1, some in UTF-8, and others in some other encoding. EncodeAnt can handle this and convert them all to a single UTF-8 standard.

Here is the link (note that the current version only works on Windows though):

I hope that helps.

Laurence.




###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages