Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Practical Prolog - tokenizing, mapping, and transforming text

15 views
Skip to first unread message

Terrence Brannon

unread,
Feb 8, 2010, 4:42:09 AM2/8/10
to
Hello,

I am aiming to make a complete Prolog version of a Perl module that
generates latin-looking text:

http://search.cpan.org/~adeola/Text-Lorem-0.3/lib/Text/Lorem.pm

My first goal is to take a string of latin-looking text and

1. split the string into a list of strings by considering whitespace
as a delimiter.

2. take the list of strings and make each string lowercase

3. remove any character from any string which is not a POSIX "word"
character, the POSIX word characters being: A-Za-z0-9 and underscore.

Practically, let's say we have the initial string of latin-looking
text like this:

lorem_text("ipsum? lorem! lingua romana perligata.").

To satisfy goal #1 above, we need a predicate
tokenize(String, ListOfStrings).

To satisfy goal #2, we need some mapping predicate which applies a
predicate that lowercases text.

Goal #3 requires a mapping predicate which applies a
predicate applies regular expression substition (or something similar)
to a string

Concretely, the generate_wordlist Perl function does all 3 goals. The
source code, for it is here:

http://gitorious.org/text-lorem/text-lorem/blobs/master/lib/Text/Lorem.pm#line17


The full latin-looking text, in Prolog is here:
http://gitorious.org/text-lorem/text-lorem/blobs/master/prolog/lorem.prolog

I request help on the Prolog version of generate_wordlist, per the
spec above. You are welcome to post here or join the gitorious project
as a contributor.

YauHsienHuang

unread,
Feb 8, 2010, 2:16:42 PM2/8/10
to
On Feb 8, 5:42 pm, Terrence Brannon <metap...@gmail.com> wrote:
> My first goal is to take a string of latin-looking text and
>
> 1. split the string into a list of strings by considering whitespace
> as a delimiter.
>
> 2. take the list of strings and make each string lowercase
>
> 3. remove any character from any string which is not a POSIX "word"
> character, the POSIX word characters being: A-Za-z0-9 and underscore.
>

The goal #1 is similar with the solution of Problem #9 of P-99,
https://prof.ti.bfh.ch/hew1/informatik3/prolog/p-99/ .

The goal #2 and #3 are simple recursive predicates.

The Quiet Center

unread,
Feb 10, 2010, 2:12:54 PM2/10/10
to
On Feb 8, 2:16 pm, YauHsienHuang <g9414002.pccu.edu...@gmail.com>
wrote:

> > 1. split the string into a list of strings by considering whitespace
> > as a delimiter.
>
> > 2. take the list of strings and make each string lowercase
>
> > 3. remove any character from any string which is not a POSIX "word"
> > character, the POSIX word characters being: A-Za-z0-9 and underscore.
>
> The goal #1 is similar with the solution of Problem #9 of P-99,https://prof.ti.bfh.ch/hew1/informatik3/prolog/p-99/.

It took me a long long time to do it, but I did it:
http://gitorious.org/text-lorem/text-lorem/blobs/master/prolog/split_on_char.prolog

Now, I'd like to make it a module to import into my main program

>
> The goal #2 and #3 are simple recursive predicates.

Easy for you to say :)

0 new messages