Concordance - Sentence View

471 views
Skip to first unread message

sysvol32

unread,
Mar 7, 2012, 11:20:51 AM3/7/12
to AntConc-discussion
Dear Laurence,

I am about to finish my database for my M.A thesis. The thing that I
need is to have sentence concordances for two different node words.
Nearly, I have 1.000 different texts in my corpus. Is it possible for
me to generate sentence concordances in AntConc? If it is possible,
how can I do that?
If it is not possible for now, do you consider to add this function to
the next release? I have perl regular expression for sentence
splitting; however, the thing is to implement that expression to the
application.

Thanks in advance
Umut

Laurence Anthony

unread,
May 18, 2012, 2:17:37 AM5/18/12
to ant...@googlegroups.com
Hi Umut,

Checking my emails today, I noticed I missed a few AntConc questions. Here's the answer to your first one. (I hope it's not too late).

At the moment, AntConc does not have a sentence concordance option. I think the only way to really achieve it (and it's not a good way) is to pre-divide your corpus so one file contains one sentence and then set the AntConc search window size to be big enough so that the whole sentence will be shown. I suppose you could also just put some delimiter at the beginning and end of each sentence (e.g. a tab) and then after generating concordance lines in the regular way (with a wide search window size), you could copy the results to a spreadsheet program (e.g. Excel) so that one column would nicely show the sentence concordance line.

I really should add this feature to AntConc. It has been on my list of things to do for a very long time.

Sorry to not be of more help!

Laurence.




--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To post to this group, send email to ant...@googlegroups.com.
To unsubscribe from this group, send email to antconc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/antconc?hl=en.


sysvol32

unread,
Jul 24, 2012, 3:25:52 AM7/24/12
to ant...@googlegroups.com
Dear Laurence,
I was just wondering whether AntConc generates concordance lines and clusters in between sentence tags. I have nearly 800.000 different sentences in my corpus, and all of them are sentence tagged. What I need to do is to generate "Concordance Lines & Cluster Lists"  in between these tags.

Examples:
<s>Kent, her bir yanı beyaza kesmiş, kendini dağ gibi kabartmış dev bir kuştu.</s>
<s>Kanatları altında neyi barındırdığı, neyi sakladığı belli olmayan, sabırlı, kinci, kararlı bir kuş.</s>
<s>Dikkat edilecek olursa, bütünüyle dışsal olana ait bir son sestir bu.</s>

Besides, I tried to generate collocate lists of two words like "ya da"; however, AntConc only generates one word's (like "ya") collocates in Collocate tool. Do you think that it is a bug? By the way, the latest release of AntConc works problematically in massive corpora, so I am using the 3.2.4w version of the application. I know that you are aware of this problem.

Best regards
Umut

Laurence Anthony

unread,
Jul 31, 2012, 12:18:54 AM7/31/12
to ant...@googlegroups.com
Hi Umut,

Sorry for the delay. I was away at a conference.

> I was just wondering whether AntConc generates concordance lines and
> clusters in between sentence tags. I have nearly 800.000 different sentences
> in my corpus, and all of them are sentence tagged. What I need to do is to
> generate "Concordance Lines & Cluster Lists" in between these tags.

Ahh, you have spotted another feature that I have wanted to add for a
very long time. AntConc can ignore text between certain tags (via the
Global Settings->Tag Settings option). But, it cannot do the reverse
and find text between a particular set of tags.

I'm adding this now as a first priority for the next release along
with a few other new features mentioned recently. I'll also be
improving the speed of the KWIC Concordance Tool at the same time.

> Besides, I tried to generate collocate lists of two words like "ya da";
> however, AntConc only generates one word's (like "ya") collocates in
> Collocate tool.

I'm not sure exactly what you mean here. The table of results in the
collocates is always based on individual words. So, if you search for
collocates of "ya" you will get "da" and visa verse. Are you expecting
to see "ya da" in the list of collocates?

> By the way, the latest
> release of AntConc works problematically in massive corpora, so I am using
> the 3.2.4w version of the application. I know that you are aware of this
> problem.

Hmm.. I have always felt that AntConc 3.2.4 also works problematically
with massive corpora. I think I discovered the program starting to
crash at around file sizes of 14 MB. I haven't looked into why this
occurs (it shouldn't), but I didn't realize that 3.3.x is having
bigger problems. What size of file starts to cause problems? Perhaps
you could send me a problem file (or corpus) privately, so that I can
test it.

Laurence.

sysvol32

unread,
Jul 31, 2012, 1:42:54 AM7/31/12
to ant...@googlegroups.com
Dear Laurence,


> I'm adding this now as a first priority for the next release along
> with a few other new features mentioned recently. I'll also be
> improving the speed of the KWIC Concordance Tool at the same time.

I'll be glad to see that Concordance and Collocates tools have this function in AntConc's next release.


> I'm not sure exactly what you mean here. The table of results in the
> collocates is always based on individual words. So, if you search for
> collocates of "ya" you will get "da" and visa verse. Are you expecting
> to see "ya da" in the list of collocates?

For example, I would like to generate the collocates table of "black and". If I write "black and" to the search term box in collocates tool, I want  to see that "white" appears as a collocate of "black and". The example from Turkish is something like this. For now, it only generates single word's collocates although I write two words to the box. I should also say that the number of search terms can increase (3words-4 words).  

> Hmm.. I have always felt that AntConc 3.2.4 also works problematically
> with massive corpora. I think I discovered the program starting to
> crash at around file sizes of 14 MB. I haven't looked into why this
> occurs (it shouldn't), but I didn't realize that 3.3.x is having
> bigger problems. What size of file starts to cause problems? Perhaps
> you could send me a problem file (or corpus) privately, so that I can
> test it.

The size of my corpora is nearly 76 megabytes for now. 3.2.4w works fine; however, the latest release freezes when I try to generate collocates table of frequently used words ("and" for English, "ve" for Turkish).

Best regards
Umut

31 Temmuz 2012 Salı 07:18:54 UTC+3 tarihinde Laurence Anthony yazdı:

Laurence Anthony

unread,
Jul 31, 2012, 11:37:06 PM7/31/12
to ant...@googlegroups.com
Hi Umut,


>> I'm adding this now as a first priority for the next release along
>> with a few other new features mentioned recently. I'll also be
>> improving the speed of the KWIC Concordance Tool at the same time.
>
> I'll be glad to see that Concordance and Collocates tools have this function
> in AntConc's next release.

Great.

>
>> I'm not sure exactly what you mean here. The table of results in the
>> collocates is always based on individual words. So, if you search for
>> collocates of "ya" you will get "da" and visa verse. Are you expecting
>> to see "ya da" in the list of collocates?
>
> For example, I would like to generate the collocates table of "black and".
> If I write "black and" to the search term box in collocates tool, I want to
> see that "white" appears as a collocate of "black and". The example from
> Turkish is something like this. For now, it only generates single word's
> collocates although I write two words to the box. I should also say that the
> number of search terms can increase (3words-4 words).

If you type "black and" and set the window span to something
reasonable, e.g 5L to 5R, you will find *words* that collocate with
"black and". For example, if you set the window span from 2R to 2R,
you effectively generate all the 3-word clusters including "black
and". Is this what you want?

Interesting, the correct statistical measure for collocates of a
phrase like "black and" is an interesting research problem and the
current measure used by AntConc might not actually be the best way.
But, if you look at the results they at least look reasonable.

>
>> Hmm.. I have always felt that AntConc 3.2.4 also works problematically
>> with massive corpora. I think I discovered the program starting to
>> crash at around file sizes of 14 MB. I haven't looked into why this
>> occurs (it shouldn't), but I didn't realize that 3.3.x is having
>> bigger problems. What size of file starts to cause problems? Perhaps
>> you could send me a problem file (or corpus) privately, so that I can
>> test it.
>
> The size of my corpora is nearly 76 megabytes for now. 3.2.4w works fine;
> however, the latest release freezes when I try to generate collocates table
> of frequently used words ("and" for English, "ve" for Turkish).

Wow. I'm surprised that version 3.2.4 can work with huge 76 MB files.
My tests showed it suffered problems with smaller files. Is the corpus
publicly available? I'd like to test it myself.

Laurence.

sysvol32

unread,
Aug 17, 2012, 2:42:45 AM8/17/12
to ant...@googlegroups.com
Dear Laurence,
Sorry for the delay. I was away.

>If you type "black and" and set the window span to something
>reasonable, e.g 5L to 5R, you will find *words* that collocate with
>"black and". For example, if you set the window span from 2R to 2R,
>you effectively generate all the 3-word clusters including "black
>and". Is this what you want?

Yes, this is what I need for generating the collocates of the phrases. If I select 2R or 2L, I am able to generate the list. Thanks for that.


>Wow. I'm surprised that version 3.2.4 can work with huge 76 MB files.
>My tests showed it suffered problems with smaller files. Is the corpus
>publicly available? I'd like to test it myself.

Unfortunately, the corpus is not publicly available. That will be the resource of my MA thesis.

Hope to see the next release soon...

Best regards,
Umut

1 Ağustos 2012 Çarşamba 06:37:06 UTC+3 tarihinde Laurence Anthony yazdı:
Reply all
Reply to author
Forward
0 new messages