error

Ramon Masià

unread,

Feb 4, 2014, 6:56:32 AM2/4/14

to antword...@googlegroups.com

Hi,

When I process my files with AntWordProfiler, I have this error:

Finished loading level lists

Error: [ lletra ] appears in levels [ 1 ] with a frequency of [ 2 ]

Error: [ nombre ] appears in levels [ 1 ] with a frequency of [ 20 ]

Loading user files...

Finished loading user files

best regards,

Ramon

Laurence Anthony

unread,

Feb 4, 2014, 7:05:18 AM2/4/14

to antword...@googlegroups.com

Hi Ramon,

The error is not with AntWordProfiler but with your files. It seems that you have the same word appearing twice or more in the level list. Have you searched for the word to identify the problem?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antwordprofil...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thomas zapounidis

unread,

Feb 4, 2014, 7:58:13 AM2/4/14

to antword...@googlegroups.com

Hi Ramon,

Having used both the Antconc and the Antwordprofiler software I would agree that the occurence of multiple entries of the same word in your list is the source of the problem.

An easy and very fast way to do this is to copy the list of words into an excel file and under the data use the delete double entries(this will immediately keep only the unique words or types as they are called). If you still wish to do it the hard way you could open the list file and use the search word and delete the second lletra word(there are 2 in your file) and the other 19 nombre occurences(there are 20) in the file.

Regards,

Thomas Zapounidis

From: antho...@gmail.com
Date: Tue, 4 Feb 2014 21:05:18 +0900
Subject: Re: error
To: antword...@googlegroups.com

Laurence Anthony

unread,

Feb 4, 2014, 8:53:54 AM2/4/14

to antword...@googlegroups.com

Hi Ramon,

I'm confused. Are you saying that the lists built into AntWordProfiler have double entries?

Which lists are you referring to?

By the way, I can code so I can find double entries in a much better way than using Excel or raw searching. In fact, I have a dedicated script to find duplicates in any number of files.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Laurence Anthony

unread,

Feb 4, 2014, 8:59:12 AM2/4/14

to antword...@googlegroups.com

On Tuesday, 4 February 2014 22:53:54 UTC+9, Laurence Anthony wrote:

Hi Ramon,

I'm confused. Are you saying that the lists built into AntWordProfiler have double entries?

Oops. Sorry about that last post I made. I was thinking the post was made by Ramon, but after sending it I noticed that it was sent by Thomas to Ramon.

Thanks Thomas! Every you say is correct. And, your suggested way to find the double entries is a good one, too. I'll endeavor not to hit the send button so quickly next time!

Laurence.

Ramon Masià

unread,

Feb 7, 2014, 3:51:42 AM2/7/14

to antword...@googlegroups.com

I think the lemma file is correct. I suspect that is a codification issue in the file (I use Ancient greek alphabet), because the list has no repeated words. I'm going to look into it,

thank you

Ramon

El dimarts 4 de febrer de 2014 13:05:18 UTC+1, Laurence Anthony va escriure:

Ramon Masià

unread,

Feb 7, 2014, 5:54:04 AM2/7/14

to antword...@googlegroups.com

Hi, again,

finally I realized that the issue is an ancient greek character that AntWordProfiler doesn't recognize, the àpex for thousand, that is "͵" like a comma. For example, the number

1234

is, in greek

͵αβγδ

this character seems to be not recognized by AntWordProfiler.

thanks

Ramon

El divendres 7 de febrer de 2014 9:51:42 UTC+1, Ramon Masià va escriure:

Laurence Anthony

unread,

Feb 7, 2014, 6:25:59 AM2/7/14

to antword...@googlegroups.com

Is it part of the Unicode standard? If it is, both AntConc and AntWordProfiler should cope with it as well as "a", "b", or "c".

Is it really a separate character, or is it one of those strange combination characters? Very interesting.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 7:10:40 AM2/7/14

to antword...@googlegroups.com

Yes, it's an Unicode standard, as far as I know. In fact, I must add this character to AntConc and AntWordProfiler token definition characters, and then the software recognizes the character without any problem. It's a separate character, called Greek Lower Numeral Sign, unicode 3075, as you can see in this document with all ancient greek alphabet symbols for Unicode and Betacode,

2014-02-07 Laurence Anthony <antho...@gmail.com>:

--
You received this message because you are subscribed to a topic in the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antwordprofiler/H0cDXws1iNI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antwordprofil...@googlegroups.com.

codisTLG.pdf

Laurence Anthony

unread,

Feb 7, 2014, 7:19:26 AM2/7/14

to antword...@googlegroups.com

Ahh! I just realized what the problem is. Your character is *not* a letter character. It's a number (or maybe punctuation)! The default token definition in both AntConc and AntWordProfiler is characters from the *letters* class of Unicode.

So, the obvious workaround is to add this *non-letter* character to the token definition.

Do you understand the problem. It's obvious if you think about it.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 7:57:31 AM2/7/14

to antword...@googlegroups.com

Yes, of course, you are right, this is the problem (but I don't think
it is a punctuation mark). I will change the token definition.

By the way: in AntConc I can add more characters to the token
definition, but in AntWordProfiler I need to change the token
definition. Perhaps, It woul be better to add the possibility in
future releases of both to define the alphabet from a file text.

2014-02-07, Laurence Anthony <antho...@gmail.com>:

Laurence Anthony

unread,

Feb 7, 2014, 8:02:54 AM2/7/14

to antword...@googlegroups.com

Ramon, the name gives it away: "Greek Lower **Numeral Sign**"

It's a numerical symbol. Clearly not a letter.

Yes, it would be nice to allow tokens to be appended to the token definition in AWP. But, basically, in your case, it is a quite simple edit.

\p{L} => [\p{L}͵]

I hope that helps.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Laurence Anthony

unread,

Feb 7, 2014, 8:14:17 AM2/7/14

to antword...@googlegroups.com

>Perhaps, It would be better to add the possibility in

future releases of both to define the alphabet from a file text.

What sort of text file? Do you just mean a whole lot of text? Or do you mean an actual list of defining tokens?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 8:15:25 AM2/7/14

to antword...@googlegroups.com

Thank you again. I've done what you have said, but the configuration file is not utf-8 and I can't append the apex as it is in Unicode. I'he changed this line:

token_def => "(?<![\\p{N}\\p{L}])\\p{L}+[\\p{L}\\p{N}\x{375}]*",

and it seems to work fine

2014-02-07 Laurence Anthony <antho...@gmail.com>:

Laurence Anthony

unread,

Feb 7, 2014, 8:18:50 AM2/7/14

to antword...@googlegroups.com

Is that correct regex? It looks wrong to me. You should test it carefully.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 8:25:33 AM2/7/14

to antword...@googlegroups.com

An actual list of defining tokens. With AntConc 3.2.4 I used a list of defining tokens in the box of definition of tokens and it worked fine In fact, until AntConc 3.4.1, I only worked with 3.2.4, because the updated releases give me some problems. In later releases i think, but I'm not sure, this box dissapeared. Now, with the last version, I add some tokens (some apex and diacrítical that AntConc doesn't recognizes automatically) in the token definition and it works fine.

The problem of the box of definition of tokens is that is difficult to see what I write in, because it's so tiny. Furthermore, I can't write utf-8 characters in any box of AntConc (either in definition token box or search box); I must writte whenever I need in some editor and then copy/paste it in the boxes of AntConc. For that reason, perhaps, it will be better to give the possibility to obtain the definition tokens from a file.

2014-02-07 Laurence Anthony <antho...@gmail.com>:

Ramon Masià Fornos

unread,

Feb 7, 2014, 8:27:31 AM2/7/14

to antword...@googlegroups.com

I'm not sure, I will test again.

thanks

Laurence Anthony

unread,

Feb 7, 2014, 9:14:45 AM2/7/14

to antword...@googlegroups.com

Are you using a display with magnified text (e.g. 125%)? I think you'll find that this setting caused the token definition window to drop off the bottom of the screen making it seem that it wasn't there in 3.3.5 and earlier. Somebody reported this so I redesigned the interface of 3.4.1 to make it smaller and also resizable That's why you can now see it.

I see why a file import might be useful. Actually, the function is already there but you just don't know it. Have you ever opened the user settings file in a text editor? You'll see a place to type in the token definition. It is read into AntConc as UTF-8, so if you replaced the characters there it would always work.

In AntConc 4.0, we won't have the problem any more. You'll be able to type any character directly into the interface without problems.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 10:33:07 AM2/7/14

to antword...@googlegroups.com

ah! I must use the settings file, of course. I didn't realize. Thank you again.

2014-02-07 Laurence Anthony <antho...@gmail.com>:

Laurence Anthony

unread,

Feb 7, 2014, 12:05:07 PM2/7/14

to antword...@googlegroups.com

You don't have to use the settings file, but it would be one way to solve the problem of inputting strange characters directly into the interface.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################

Ramon Masià Fornos

unread,

Feb 7, 2014, 12:14:33 PM2/7/14

to antword...@googlegroups.com

Yes, of course. I've said 'I must', but I wanted to say 'I could'.

2014-02-07 Laurence Anthony <antho...@gmail.com>:

Reply all

Reply to author

Forward