TagAnt issue

104 views
Skip to first unread message

Alexander Kautzsch

unread,
May 6, 2016, 6:37:36 AM5/6/16
to AntConc-discussion
Dear Laurence,

I'm just trying to use TagAnt on the written component of ICE-US (converted to UTF-8) and it seems that some files are not processed, i.e. TagAnt will not really crash but continue tagging a certain file on end and won't create at a tagged file. The files are about 2000 words in size. So that shouldn't be a problem, right? Any idea?

Best,
Alex

Laurence Anthony

unread,
May 6, 2016, 11:22:01 PM5/6/16
to ant...@googlegroups.com
Hi Alex,

Have you checked to make sure the file encoding is correct (UTF-8)? Perhaps use my EncodeAnt tool to check it.

Regards,

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at https://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Alexander Kautzsch

unread,
May 7, 2016, 5:44:22 AM5/7/16
to AntConc-discussion
Hi Laurence,

thanks for your swift reply. This is actually what I did before tagging. What I have not done is check the outcome of the conversion to UTF-8. When I run "detect encoding" on the files in the utf8 folder, It seems that most files retain their original encoding (mostly ASCII, some TIS-620, some ISO-8859-2, some others). Do I need to change the settings somewhere?

Thanks again and best regards,
Alex

Laurence Anthony

unread,
May 7, 2016, 6:08:09 AM5/7/16
to ant...@googlegroups.com
Hi, 

ASCII should be fine as it's a subset of UTF-8. TIS-620 and ISO-8859-2 are almost certainly EncodeAnt guesses that are incorrect, but they do indicate that the files are not saved in UTF-8 from the outset. I suggest opening the problem files in Notepad++ and recoding them as UTF-8.

Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Alexander Kautzsch

unread,
May 7, 2016, 9:01:54 AM5/7/16
to AntConc-discussion
Hi,
thanks for the suggestions. None of this worked, but I found the solution.
TagAnt had problems only with two files that were originally encoded in ASCII; after converting them to UTF-8 the problem was still there.
I browsed through the files in notepad++,  found one "sub" (substitution character, right?) in each file, deleted the "sub" and then TagAnt worked.
Sorry to have bothered you and thanks for your help and time, and four your amazing tools, of course! Could have said that earlier ;)
Best,
Alex

Laurence Anthony

unread,
May 7, 2016, 9:37:51 AM5/7/16
to ant...@googlegroups.com
Hi  again,

Thanks for the nice comments.

What do you mean by "sub"? 

Alexander Kautzsch

unread,
May 9, 2016, 6:29:18 AM5/9/16
to AntConc-discussion
Hi Laurence,
what I mean by "sub" is shown in the screenshot below (taken from Notepad++ ). As soon as I delete the "sub", TagAnt works perfectly.
Best,
Alex

Auto Generated Inline Image 1

Laurence Anthony

unread,
May 9, 2016, 6:53:57 AM5/9/16
to ant...@googlegroups.com
Hi,

Ah, the SUB is a weird character that has been introduced through some weird file saving. 


Yes, you need to delete it. With a program like Notepad++, you can simply search and delete all the SUBs in all the files in an instant.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
Reply all
Reply to author
Forward
0 new messages