Python NLTK equivalent,

75 views
Skip to first unread message

Sean Charles

unread,
Nov 21, 2017, 3:28:41 AM11/21/17
to SWI-Prolog
I've recently been using NLTK to analyse and sort text for a project.

Does anybody know of any good libraries, works, general Prolog that I might use with SWI ?

Thank you,

Sean.

Jan Wielemaker

unread,
Nov 21, 2017, 7:08:30 AM11/21/17
to Sean Charles, SWI-Prolog
On 11/21/2017 09:28 AM, Sean Charles wrote:
> I've recently been using NLTK to analyse and sort text for a project.
>
> Does anybody know of any good libraries, works, general Prolog that I
> might use with SWI ?

It depends a lot on what you want to do. There are some links on
http://www.swi-prolog.org/Links.html (very poorly maintained page;
anyone willing to review and update it?)

SWI-Prolog itself doesn't offer much. It does a few things, such as
Unicode transformations (diacritics removal, case conversion,
normalization), stemming (Snowball), metaphone and a distance
function (isub) that works pretty well for names/identifiers.

Somewhere on my filesystem there is also a Stanford NLP toolkit
interface that manages one or more NLP instances that you can
send sentences and you get back a Prolog representation of the
parser output. Bit rusty, so I don't know how much still works.
If anyone is interested I'll have a look whether I can find it
and put it on github.

You can also use the real/Rserve R interfaces to hook up R's NLP
packages.

You could probably do something similar to NLTK. Today I had a look at
pyswip, but it only seems to embed Prolog into Python, not yet the other
way around. Still, web/pipe-based interaction shouldn't be that hard to
implement and most NLP routines are slow enough to not worry about the
network latency.

Cheers --- Jan


>
> Thank you,
>
> Sean.
>
> --
> You received this message because you are subscribed to the Google
> Groups "SWI-Prolog" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to swi-prolog+...@googlegroups.com
> <mailto:swi-prolog+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/swi-prolog.
> For more options, visit https://groups.google.com/d/optout.

e.patsa...@imperial.ac.uk

unread,
Nov 21, 2017, 7:24:17 AM11/21/17
to swi-p...@googlegroups.com
I've been working on a statistical NLP library for Prolog for a while now, on and off. So far the only model possible to train is an n-gram model, thought without smoothing. There's a few predicates to gather statistics, like k most frequent n-grams, k most frequent words and sentences and so on.

I was planning to put it online but I never gotten round to it. If you're interested, I could put the code on github in a couple of days.

Carlo Capelli

unread,
Nov 21, 2017, 8:24:12 AM11/21/17
to Sean Charles, SWI-Prolog
Hi Sean

I've uploaded an usable (I hope !) WN3 to github, I think could be practical, specially if you know the NLTK counterpart.

Now, an unrelated notice... Jan W. didn't suggested it, but tokenize_atom it's great if you need to handle basic text (has a small 'contact surface' with NLP).

Ciao


--
You received this message because you are subscribed to the Google Groups "SWI-Prolog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swi-prolog+unsubscribe@googlegroups.com.

Jan Wielemaker

unread,
Nov 21, 2017, 8:35:50 AM11/21/17
to Carlo Capelli, Sean Charles, SWI-Prolog
On 11/21/2017 02:24 PM, Carlo Capelli wrote:
> Hi Sean
>
> I've uploaded an usable (I hope !) WN3 to github
> <https://github.com/CapelliC/wn-swipl>, I think could be practical,
> specially if you know the NLTK counterpart.

Indeed, using Wordnet from Prolog should be really easy, especially
since the introduction of multi-argument indexing it pretty fast.

I wonder whether we should load Wordnet into the public SWISH. Might
give some people some inspiration :)

> Now, an unrelated notice... Jan W. didn't suggested it, but
> tokenize_atom
> <http://www.swi-prolog.org/pldoc/doc_for?object=tokenize_atom/2> it's
> great if you need to handle basic text (has a small 'contact surface'
> with NLP).

I left it out because it is a rather poor men's tokenizer.
tokenize_atom/2 simply breaks over spaces and punctuation. Proper NLP
tokenization is not easy (and language specific).

Cheers --- Jan

Torbjörn Lager

unread,
Nov 21, 2017, 8:59:45 AM11/21/17
to Jan Wielemaker, Carlo Capelli, Sean Charles, SWI-Prolog

Jan Wielemaker wrote:

> I wonder whether we should load Wordnet into the public SWISH. Might
> give some people some inspiration :)

Great idea! If you do, I’ll be able to run some Wordnet exercises with an NLP student group in just two weeks from now. Haven’t thought about that before... :-)

- Torbjörn

emacstheviking

unread,
Nov 21, 2017, 9:03:31 AM11/21/17
to Jan Wielemaker, SWI-Prolog
Hi Jan,

Thanks for that.

I have decided to use Flask to wrap the NLTK stuff so I can just use the http client libraries... as I know it's for Prolog consumption I will have my Flask wrapper send back the output in a more friendly format... hell if it's useful I can GitHub it can't I.

A lightweight wrapper won't add much overhead at all to processing times. Not bothered anyway!

Sean.



Thank you,

Sean.

Jan Wielemaker

unread,
Nov 21, 2017, 9:12:00 AM11/21/17
to emacstheviking, SWI-Prolog
On 11/21/2017 03:02 PM, emacstheviking wrote:
> Hi Jan,
>
> Thanks for that.
>
> I have decided to use Flask to wrap the NLTK stuff so I can just use the
> http client libraries... as I know it's for Prolog consumption I will
> have my Flask wrapper send back the output in a more friendly format...
> hell if it's useful I can GitHub it can't I.

Consider using JSON. That should make transferring the data easy
and transparent. Prolog's JSON I/O should at some point be moved
entirely to C to get maximum performance, but it is probably good
enough as it is.

Then you simply has a POST of a JSON object with the request and
you get the result back as a Prolog dict.

Cheers --- Jan

Jan Wielemaker

unread,
Nov 21, 2017, 9:14:23 AM11/21/17
to Torbjörn Lager, Carlo Capelli, Sean Charles, SWI-Prolog
Could someone write PlDoc comments for the Wordnet API? Then we can
make a nice module and the SWISH editor will help you a little (at
least understanding these hard-to-read Wordnet predicate names).

Cheers --- Jan

>
> - Torbjörn
>

Samer Abdallah

unread,
Nov 21, 2017, 2:49:17 PM11/21/17
to Sean Charles, SWI-Prolog
HI Sean,
I’ve got a few NLP tools including some lexical databases derived
from Moby, Wordnet and OALD, and also an interface to NLTK
via Python. They’re not necessarily very user-friendly but you’re 
welcome to have a look and see if there’s anything you’d like me
to polish up.


I also have a more general interface to Python here:
I should probably update plnltk to use this but I haven’t got round to it.

Samer.

--
You received this message because you are subscribed to the Google Groups "SWI-Prolog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swi-prolog+...@googlegroups.com.
signature.asc

Samer Abdallah

unread,
Nov 21, 2017, 3:09:56 PM11/21/17
to Jan Wielemaker, Torbjörn Lager, Carlo Capelli, Sean Charles, SWI-Prolog
There are some PlDocs in here (slightly modified from Sarah Witzig’s original):
https://code.soundsoftware.ac.uk/projects/plex/repository/entry/wn.pl

I see Carlo’s version has some other features and includes the original
Wordnet files, whereas mine points to an external directory for the originals.
We should probably put together some sort of definitive wordnet pack..

Samer

>
> Cheers --- Jan
>
>> - Torbjörn
>
> --
> You received this message because you are subscribed to the Google Groups "SWI-Prolog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to swi-prolog+...@googlegroups.com.
signature.asc

Jan Wielemaker

unread,
Nov 21, 2017, 3:50:10 PM11/21/17
to Samer Abdallah, Torbjörn Lager, Carlo Capelli, Sean Charles, SWI-Prolog
On 11/21/2017 09:09 PM, Samer Abdallah wrote:
>> Could someone write PlDoc comments for the Wordnet API? Then we can make a nice module and the SWISH editor will help you a little (at
>> least understanding these hard-to-read Wordnet predicate names).
> There are some PlDocs in here (slightly modified from Sarah Witzig’s original):
> https://code.soundsoftware.ac.uk/projects/plex/repository/entry/wn.pl

This seems an extended version of something I once put on the wiki back
in the days we were not even self-hosted :) I think we should have a
pack that includes the driver and a make install that downloads the
sources from Princeton.

> I see Carlo’s version has some other features and includes the original
> Wordnet files, whereas mine points to an external directory for the originals.
> We should probably put together some sort of definitive wordnet pack..

Anyone fancy to sort out a unified version?

Cheers --- Jan

Jan Burse

unread,
Nov 21, 2017, 6:40:33 PM11/21/17
to SWI-Prolog
BTW, it seems that finally an AI stack exchange has emerged again:

https://ai.stackexchange.com/

Jan Wielemaker

unread,
Nov 26, 2017, 10:52:40 AM11/26/17
to Samer Abdallah, Torbjörn Lager, Carlo Capelli, Sean Charles, SWI-Prolog
On 11/21/2017 09:09 PM, Samer Abdallah wrote:
>
> On 21 Nov 2017, at 14:14, Jan Wielemaker <j...@swi-prolog.org> wrote:
>
>> On 11/21/2017 02:59 PM, Torbjörn Lager wrote:
>>> Jan Wielemaker wrote:
>>>> I wonder whether we should load Wordnet into the public SWISH. Might
>>>> give some people some inspiration :)
>>> Great idea! If you do, I’ll be able to run some Wordnet exercises with an NLP student group in just two weeks from now. Haven’t thought about that before... :-)
>>
>> Could someone write PlDoc comments for the Wordnet API? Then we can make a nice module and the SWISH editor will help you a little (at
>> least understanding these hard-to-read Wordnet predicate names).
> There are some PlDocs in here (slightly modified from Sarah Witzig’s original):
> https://code.soundsoftware.ac.uk/projects/plex/repository/entry/wn.pl
>
> I see Carlo’s version has some other features and includes the original
> Wordnet files, whereas mine points to an external directory for the originals.
> We should probably put together some sort of definitive wordnet pack..

I picked the version from above, completed comments and refactored a
bit. Uploaded to http://www.swi-prolog.org/pack/list?p=wordnet
The source repo is https://github.com/JanWielemaker/wordnet

It is all a bit rough, but at least it has a place now. Note that the
lazy loading makes the first query fail. I pushed a patch for that to
swipl-devel.git.

I'll make this available from SWISH soon. SWISH provides a nice
opportunity to share ideas about more high level relations.

Cheers --- Jan

Jan Wielemaker

unread,
Nov 27, 2017, 4:45:10 AM11/27/17
to Samer Abdallah, Torbjörn Lager, Carlo Capelli, Sean Charles, SWI-Prolog
On 11/26/2017 04:51 PM, Jan Wielemaker wrote:

> I'll make this available from SWISH soon.  SWISH provides a nice
> opportunity to share ideas about more high level relations.

Enjoy at https://swish.swi-prolog.org/p/OOOuPYcp.swinb

I'm not a linguist :) Please create some nice realistic notebooks
that can be added to the default set of examples.

Cheers --- Jan
Reply all
Reply to author
Forward
0 new messages