Foreign text?

453 views
Skip to first unread message

JP Racine

unread,
Mar 3, 2011, 12:40:26 AM3/3/11
to AntConc-discussion
Hi all,

It doesn't look like I can use AntConc with Japanese text files. Does
anyone know of similar software for japanese corpora?

Thanks,

John

Laurence Anthony

unread,
Mar 3, 2011, 1:00:01 AM3/3/11
to ant...@googlegroups.com
Hi John,

AntConc works perfectly with Japanese text files. You will need to know the language encoding of the files. The default for Japanese is Shift-JIS, but your files might be in Unicode UTF-8 or another Unicode encoding.

Go to the global settings of AntConc and set the language encoding there to match your files. Then, everything will work as expected.

I hope that helps.
Laurence.

###############################################################
Laurence Anthony, Ph.D.
Professor, Director of CELESE
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################




--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To post to this group, send email to ant...@googlegroups.com.
To unsubscribe from this group, send email to antconc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/antconc?hl=en.


John Racine

unread,
Mar 3, 2011, 8:07:34 PM3/3/11
to ant...@googlegroups.com
Hi Laurence,

My files were saved as .txt and then encoded as Japanese text for Mac.  I'm not sure what coding that corresponds to in Global Settings.

More troubling:  No matter how I open the files now (in Word, TextEdit or Script Editor) they now appear as gibberish.

Any ideas?

John

Laurence Anthony

unread,
Mar 3, 2011, 8:47:33 PM3/3/11
to ant...@googlegroups.com
Hi John,

To solve your problem, you need to open your files in a proper text editor, like TextPad or NotePad ++. There, you can resave your files in with the Windows line endings (which are different from those for Mac) and also change the language encoding. The standard is to use UTF-8.

I suggest you obtain the raw files again, make a backup, and then process them as necessary.


I hope that helps.
Laurence.

(Why not upload one of the files here so that we can see what they look like?)


###############################################################
Laurence Anthony, Ph.D.
Professor, Director of CELESE
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################



Rebecca Bourke

unread,
Oct 23, 2012, 11:07:25 AM10/23/12
to ant...@googlegroups.com
I have changed the language setting on my AntConc to Japanese shift, but when I try to input a word into the search bar roman characters appear even though my Mac has been set to type in Japanese. Can anyone offer help?

Laurence Anthony

unread,
Oct 23, 2012, 11:18:46 AM10/23/12
to ant...@googlegroups.com
Hi John,

Which version of AntConc are you using? Also, can you upload a file
from your corpus (or even the file with all but one line deleted) so
we can check the character encoding)?

I'm sure we'll be able to resolve the problems with your files. This
is a quite common difficulty when first using files with a
concordancer.

Regards,
Laurence.
> --
> You received this message because you are subscribed to the Google Groups
> "AntConc-discussion" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/antconc/-/wEog2EmMiJoJ.

Jessie

unread,
Nov 7, 2012, 8:19:17 AM11/7/12
to ant...@googlegroups.com
Hi Laurence,
I'm also having trouble with Japanese characters. Do you have any step by step instructions on encoding? I can't get notepadd++ to read the Japanese characters correctly.
Or I could send you a short extract of what I'm trying to do?
Thanks so much!
Jessie

Laurence Anthony

unread,
Nov 7, 2012, 8:32:28 AM11/7/12
to ant...@googlegroups.com
Hi Jessie,

AntConc works fine with Japanese. Can you send me a short extract of your file? I will check it for you.

Laurence.



###############################################################
Laurence Anthony, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################



--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To view this discussion on the web visit https://groups.google.com/d/msg/antconc/-/u3RgqPtM2OQJ.

Jessie

unread,
Nov 7, 2012, 2:59:27 PM11/7/12
to ant...@googlegroups.com
Thanks Laurence, here is a short extract which I am trying to save as text file to import into Antconc. 
Jessie
Text - for testing.docx

Laurence Anthony

unread,
Nov 11, 2012, 12:43:36 PM11/11/12
to ant...@googlegroups.com
Dear Jessie,

Here is the file you sent converted to text (UTF-8 encoded). It works fine in AntConc (3.3.5) with the default settings.

You cannot use Word documents in AntConc (or Notepad++). They need to be saved as plain text. When saving a Word document (with non-ASCII characters inside) as text, Word will show you an option to decide the character encoding. Here, you should chose UTF-8, which is an international standard and the default setting for AntConc 3.3.5.

I hope that helps.

Laurence.


To view this discussion on the web visit https://groups.google.com/d/msg/antconc/-/mZsiKbHz4ScJ.
Text - for testing.txt

Jess Geldart

unread,
Nov 12, 2012, 6:55:49 AM11/12/12
to ant...@googlegroups.com
Hi Laurence,
Thank you so much! That is really helpful.  I never would have worked that out on my own.
Cheers,
Jess
Reply all
Reply to author
Forward
0 new messages