Statistics-Tab for AntConc

759 views
Skip to first unread message

Carlos Fiyero

unread,
Feb 6, 2012, 12:03:03 PM2/6/12
to AntConc-discussion
It would be nice, if AntConc has a statistics-tab (similar to
Wordsmith), that has no search or other options, but simply shows some
statistics about the corpus.

E.g.:
- number of characters, number of words, number of sentences, number
of paragraphs.
- min length, max lenght and average length of words (in characters)
and sentences (in words).
- number of 1-letter-words, number of 2-letter-words, and so on until
the max length.

That would be very nice and sometimes these things are very
interesting.

And maybe it could be linked to the file-tab and concordance-tab, so
that you e.g. see your corpus has x 10-letter-words and then you can
jump between them in file-view and concordance-view.

Laurence Anthony

unread,
Feb 7, 2012, 8:37:52 PM2/7/12
to ant...@googlegroups.com
Dear Carlos,

Thank you for this and the other suggestions. Let me respond to these later.

In the meantime, I would be interested to hear what others think of these suggestions.

Laurence.

Dr CK Jung

unread,
Feb 7, 2012, 9:07:09 PM2/7/12
to ant...@googlegroups.com
Dear Laurence

You know that I am a big fan of AntConc, but I think you should listen
to Carlos' suggestions this time. This is mainly because I use
WordSmith Tools for getting this kind of statistical information
(obviously then I have to use WorsSmith Tools for the rest of analysis
in order to maintain consistency). I know it is not going to be easy,
but I really hope to see the statistics-tab in AntConc.

Best wishes
CK

> --
> You received this message because you are subscribed to the Google Groups
> "AntConc-discussion" group.
> To post to this group, send email to ant...@googlegroups.com.
> To unsubscribe from this group, send email to
> antconc+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/antconc?hl=en.

Laurence Anthony

unread,
Feb 7, 2012, 9:14:41 PM2/7/12
to ant...@googlegroups.com
Thank you CK,

I certainly listen to all feedback on AntConc. That's why this discussion group is so useful.

Actually, I'm going to be visiting Stefan Th. Gries in the US in a few weeks to discuss exactly this topic. My plan is to add a complete set of statistical tools. A new stats tab will be a necessity once this work in finished.

Laurence.

Pascal Fischer

unread,
Feb 8, 2012, 5:09:13 AM2/8/12
to AntConc-discussion
I'd like that idea too.

Gökhan Sever

unread,
Nov 3, 2014, 3:12:13 PM11/3/14
to ant...@googlegroups.com, carlos...@googlemail.com
Hello,

Is there any progress for this request? In addition to this feature I am also interested in an ability to sort words by their character length, particularly in the World List tab.

Thanks.

Laurence Anthony

unread,
Nov 3, 2014, 4:35:53 PM11/3/14
to ant...@googlegroups.com
On 3 November 2014 20:12, Gökhan Sever <gokha...@gmail.com> wrote:
Hello,

Is there any progress for this request? In addition to this feature I am also interested in an ability to sort words by their character length, particularly in the World List tab.


Unfortunately, not at the moment. It is planned for AntConc 4.0, though. The idea of sorting by word length in the Word list is also an interesting idea. Very easy to implement. I'll add it as a request.

Laurence.
 

Gökhan Sever

unread,
Nov 3, 2014, 6:08:43 PM11/3/14
to ant...@googlegroups.com
That's great that these will be implemented in a future version. In the meantime, I use this little Python script (attached) to get basic statistics from a text file. Original credit goes to https://github.com/davidthewatson/word_count_histogram

This basic functionality would be a nice addition to the program as a separate tab.

My other feature request is to list words that are unique to each individual file and all files, more like a set operation. So that I can see which file has words but not the other and which words are common in all files.

Thank again.
 
word_stats.py

Laurence Anthony

unread,
Nov 3, 2014, 6:24:50 PM11/3/14
to ant...@googlegroups.com
Thank you for this. Actually, the new version of AntConc will be written in Python.

The ability to output results per file (in addition to per corpus) is something I will be definitely adding.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.

To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Gökhan Sever

unread,
Nov 3, 2014, 6:48:42 PM11/3/14
to ant...@googlegroups.com
If there is Python, my curiosity increases a bit more. Actually your program is quite feature full. It is just a few small statistical things that I am interested to compare a few documents. Then creating a word cloud, word distribution, basic venn diagrams. Also I was planning to learn more about the NLTK to improve my understanding a bit more. Are you going to use constructs from NLTK in the next version?

Well if the development is made in the open source, a code portal, like Google Code or Github might come handy. Particularly for issue tracking. Those interfaces make issue listing much easier. Such as what is being worked on, what has been requested, what will implemented in the next release etc. 

Anastasiya Andrusenko

unread,
Nov 4, 2014, 3:58:08 AM11/4/14
to ant...@googlegroups.com, carlos...@googlemail.com
It would be nice to use this feature. This is what I am looking for at the moment. I need to get the statistics of the corpus and I realised that AntConc unfortunately doesn't have this feature. Hope in the future it will be available. 

Thanks.

Anastasia 

Gökhan Sever

unread,
May 3, 2015, 10:25:16 PM5/3/15
to ant...@googlegroups.com
Hello, I am pinging once again to see if there is any update for sorting by word-length feature? Also another sorting option would be useful is sorting by min/max word length, just like min/max freq options. This would help one to filter out some noise words easily. Thanks.
Reply all
Reply to author
Forward
0 new messages