Collocate horizons: any setting for sentence boundaries (stopping at periods)?

164 views
Skip to first unread message

TJ E

unread,
May 23, 2016, 11:50:24 AM5/23/16
to AntConc-discussion
Greetings. I did my PhD pilot study using AntConc and am now considering using it for my dissertation itself. I'm really happy with most functions. However, I'm just wondering whether there is a setting which would stop AntConc from searching for collocates across (beyond) sentence boundaries.Just so my question is clear, here is an example. Let's say I want to look for all instances of "thank you" within a text using a 4-4 span (in keeping with traditional horizons and to use the same setting as for other collocations studied). Within the advanced collocate setting, I make my search term (node) "thank". I tick the "use context words and horizons" box and add "you" as a context word, applying 4L, 4R for context horizon. Now, let's say I have a text containing:

I look forward to hearing from you. Thank you for your consideration.

AntConc will consider the node or search word "Thank" to have two potential collocates, both "you" on the right and also "you" on the left, even though the occurrence on the left is in a different sentence.

I have looked for this functionality (stopping at sentence boundaries) within AntConc 3.4.4, read the latest user support file (3.4.3), and searched for discussion questions but found nothing (yet). I hope I'm just missing something obvious!

Thanks in advance for any answers (I am using Windows 7 and sometimes Windows 10, working with learner and native English speaker texts).

Best,
Terri Everest

Laurence Anthony

unread,
May 23, 2016, 9:24:11 PM5/23/16
to ant...@googlegroups.com
Dear Terri,

I'm sorry to say that AntConc does not include an embedded sentence splitter for English (or other languages), so it knows nothing about the ends of sentences.

However, a possible workaround is to explicitly add line breaks (or any other character) after each sentence (using someone else's sentence splitter tool). Then, in AntConc, you can search in the normal way and the collocates won't cross sentence boundaries.

You'll need to experiment a little with the sentence break character and the AntConc token definition to make sure this works properly, but it certainly should be possible.

I suggest you start with a simple file like the following...

>the cat sat. on the mat
>the cat sat on the mat

...and try to find "on" as a collocate of sat. If that works, then try "mat" as a collocate of sat.

I hope that helps.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at https://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

TJ E

unread,
May 26, 2016, 2:33:43 PM5/26/16
to AntConc-discussion
Thank you, Laurence, for the speedy reply. I'll experiment a bit. If anyone knows of a good embedded sentence splitter, please share! Best, Terri

Reed Darsey

unread,
May 26, 2016, 8:50:05 PM5/26/16
to AntConc-discussion
On 26 May 2016 at 11:33, TJ E wrote:

> If anyone knows of a good embedded sentence
> splitter, please share! Best, Terri

spltta
statistical sentence boundary detection
Improved Sentence Boundary Detection
Dan Gillick

______________________________________
Reed Darsey / Grand Bay, Alabama, USA


Laurence Anthony

unread,
May 26, 2016, 10:42:12 PM5/26/16
to ant...@googlegroups.com
Hi Dan,

Do you have a link for the Spltta tool? A quick Google search did not give me an obvious link.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Reed Darsey

unread,
May 27, 2016, 7:51:34 PM5/27/16
to ant...@googlegroups.com
On 27 May 2016 at 11:41, Laurence Anthony wrote:

> Do you have a link for the Spltta tool? A quick Google
> search did not give me an obvious link.

Looking back in my notes, as of 12-29-13, I obtained it at:
https://code.google.com/p/splitta/

But I see now that it is not there.

A January 2014 e-mail from him had the address:
Daniel Gillick <dgil...@gmail.com>


My 2013 notes on alternatives have:
"LingPipe Sentence Extractor ... looks too complicated."

"MorphAdorner ... for older, not modern, English."

At the time, splitta tested OK for my usage.

Reed Darsey

unread,
May 27, 2016, 7:56:19 PM5/27/16
to ant...@googlegroups.com
On 27 May 2016 at 18:51, I wrote:

> My 2013 notes on alternatives have:

[...]

One more item I had from Aug 2014:
Libunibreak
http://wyw.dcweb.cn

JFlorian

unread,
May 27, 2016, 8:46:33 PM5/27/16
to ant...@googlegroups.com
This one--


Description:

statistical sentence boundary detection

Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.

This is the source code for the paper "Sentence Boundary Detection and the Problem with the U.S." appearing at NAACL 2009.

Code written in Python.

Dan Gillick

https://code.google.com/archive/p/splitta/downloads

Reply all
Reply to author
Forward
0 new messages