gethash substring

Frode Vatvedt Fjeld

unread,

Aug 21, 2002, 9:39:01 AM8/21/02

to

I often find that I need to look up a sub-string in a hash-table. By
"sub-string" I mean a string that starts and ends at given indexes of
another string. It goes something like this:

(gethash (subseq <original-string> <start> <end>)
<hash-table>)

The problem is that I can't afford to cons up a new string (as subseq
does) just for performing such look-ups.

The only solution I see to this problem, is to implement my own
hash-table. Is there a better way?

--
Frode Vatvedt Fjeld

Johannes Grødem

unread,

Aug 21, 2002, 9:47:10 AM8/21/02

to

* Frode Vatvedt Fjeld <fro...@acm.org>:

> The only solution I see to this problem, is to implement my own
> hash-table. Is there a better way?

Couldn't you just allocate an array with a fill-pointer and copy into
this and use it as the key-parameter to gethash? Or am I missing
something here?

--
johs

Frode Vatvedt Fjeld

unread,

Aug 21, 2002, 11:40:06 AM8/21/02

to

"Johannes Grødem" <s...@kopkillah.com> writes:

> Couldn't you just allocate an array with a fill-pointer and copy
> into this and use it as the key-parameter to gethash?

Yes, I could.

> Or am I missing something here?

This function wouldn't be reentrant, which might or might not be a
problem, I suppose. I'm not a functional (programming) fanatic, but
this sort of thing feels slightly non-Common-Lispian to a Patriot such
as myself.

--
Frode Vatvedt Fjeld

Barry Margolin

unread,

Aug 21, 2002, 11:25:48 AM8/21/02

to

In article <2hbs7we...@vserver.cs.uit.no>,

Frode Vatvedt Fjeld <fro...@acm.org> wrote:
>The problem is that I can't afford to cons up a new string (as subseq
>does) just for performing such look-ups.

Why not? If the Lisp implementation has an ephemeral GC (as many do), the
cost of short-lived objects like this is pretty small.

But the other responder's suggestion to reuse the same string is a good
one, too. You might also search around for "resources", a way to maintain
a collection of reusable objects.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Pierpaolo BERNARDI

unread,

Aug 21, 2002, 11:53:50 AM8/21/02

to

"Frode Vatvedt Fjeld" <fro...@acm.org> ha scritto nel messaggio news:2hbs7we...@vserver.cs.uit.no...

You can create _one_ displaced string and reuse
it multiple times with ADJUST-ARRAY.

(let ((substring (make-array 0 :element-type 'character
:displaced-to "" :displaced-index-offset 0)))
...

(gethash
(adjust-array substring (- <end> <start>)
:displaced-to <original-string>
:displaced-index-offset <start>)
<hash-table>))

P.

Pierpaolo BERNARDI

unread,

Aug 21, 2002, 12:01:40 PM8/21/02

to

"Frode Vatvedt Fjeld" <fro...@acm.org> ha scritto nel messaggio news:2hwuqkc...@vserver.cs.uit.no...

> > Or am I missing something here?
>
> This function wouldn't be reentrant, which might or might not be a
> problem,

Yes, in a MT CL you'll have to put the copy or displacing and
table access in a critical section, unless mutual exclusion is
guaranted by some other protocol.

> I suppose. I'm not a functional (programming) fanatic, but
> this sort of thing feels slightly non-Common-Lispian to a Patriot such
> as myself.

If you are using hash-tables you are not doing functional
programming anyway.

P.

Marco Antoniotti

unread,

Aug 21, 2002, 2:25:27 PM8/21/02

to

"Pierpaolo BERNARDI" <pierpaolo...@hotmail.com> writes:

This is a very good suggestion. It should go in the CL Cookbook.

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
715 Broadway 10th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

Kaz Kylheku

unread,

Aug 21, 2002, 6:10:59 PM8/21/02

to

Frode Vatvedt Fjeld <fro...@acm.org> wrote in message news:<2hbs7we...@vserver.cs.uit.no>...

> I often find that I need to look up a sub-string in a hash-table. By
> "sub-string" I mean a string that starts and ends at given indexes of
> another string. It goes something like this:
>
> (gethash (subseq <original-string> <start> <end>)
> <hash-table>)
>
> The problem is that I can't afford to cons up a new string (as subseq
> does) just for performing such look-ups.

You can create a string object which shares the actual character
data with the original string, though some space is still allocated
obviously to keep track of where the substring beings and ends.
Look at the :DISPLACED-TO keyword parameter of MAKE-ARRAY.
Can you spare that much space?

(defparameter *string1* "abcdefg")
(defparameter *string2* (make-array '(3)
:element-type 'character
:displaced-to *string1*
:displaced-index-offset 1))
*string2* ==> "bcd"

Espen Vestre

unread,

Aug 22, 2002, 3:46:08 AM8/22/02

to

"Pierpaolo BERNARDI" <pierpaolo...@hotmail.com> writes:

> You can create _one_ displaced string and reuse
> it multiple times with ADJUST-ARRAY.

Good idea!

Even if you don't reuse it, using a displaced array for the substring
_sometimes_ pays off (If both the whole string and the substring
are fairly large, you may save some memory. You really need to do
tests to see how much, if anything, you gain with your implementation
of choice).
--
(espen)

Ed L Cashin

unread,

Aug 24, 2002, 10:49:18 PM8/24/02

to

"Pierpaolo BERNARDI" <pierpaolo...@hotmail.com> writes:

Why not? Could you elaborate? If functions can return numbers then
why not hash tables?

--
--Ed L Cashin | PGP public key:
eca...@uga.edu | http://noserose.net/e/pgp/

Erik Naggum

unread,

Aug 24, 2002, 11:04:12 PM8/24/02

to

* Pierpaolo BERNARDI

| If you are using hash-tables you are not doing functional programming
| anyway.

* Ed L Cashin

| Why not? Could you elaborate? If functions can return numbers then
| why not hash tables?

The hash table operations do not return a new hashtable with the specified
changes made and retain the old value. Quite contrary, the whole point is
to modify the state of an existing object. This is a side effect and is not
thus sufficiently functional. (When functional programming languages
support I/O, I see little point in arguing about such things, as they have
already proved that at least the side effect of changing the state of some
I/O stream is allowable, but for some reason, the functional programming
paradigm has not been shot down as fundamentally silly in its excesses.)

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Stephen J. Bevan

unread,

Aug 25, 2002, 1:35:17 AM8/25/02

to

Erik Naggum <er...@naggum.no> writes:
> thus sufficiently functional. (When functional programming languages
> support I/O, I see little point in arguing about such things, as they have
> already proved that at least the side effect of changing the state of some
> I/O stream is allowable, but for some reason, the functional programming
> paradigm has not been shot down as fundamentally silly in its excesses.)

Leaving aside whether it is fundamentally silly, Haskell and Clean
have I/O but they do not work by allowing a change of state of an I/O
stream. Instead the type system ensures that the stream is only used
in a single threaded manner (either via monads in Haskell or unique
types in Clean) so that the "side-effect" isn't a side-effect at all,
it is a direct effect and is clear from the type of any I/O function.

Kalle Olavi Niemitalo

unread,

Aug 25, 2002, 5:32:58 AM8/25/02

to

ste...@dino.dnsalias.com (Stephen J. Bevan) writes:

> Instead the type system ensures that the stream is only used
> in a single threaded manner (either via monads in Haskell or unique
> types in Clean) so that the "side-effect" isn't a side-effect at all,
> it is a direct effect and is clear from the type of any I/O function.

Does this mean any function that writes to a log file must take
the original file as a parameter and return the modified file?
And the same with anything that calls it.

Do they have an equivalent to the Unix98 pread function, which
does not change the file offset?

Stephen J. Bevan

unread,

Aug 25, 2002, 11:29:35 AM8/25/02

to

Kalle Olavi Niemitalo <k...@iki.fi> writes:
> Does this mean any function that writes to a log file must take
> the original file as a parameter and return the modified file?
> And the same with anything that calls it.

We might have different ideas about what it means to "take the
original file as a parameter" so I'll answer by showing the types of
the functions in Haskell that would be used to open a file and write a
String to the file :-

openFile :: FilePath -> IOMode -> IO Handle
hPutStrLn :: Handle -> String -> IO ()

Here the IO type is the important one since this ensures that I/O is
single threaded and represented in the types so that it is not
possible to hide I/O inside a function which doesn't have IO in its
type signature (note I'm not arguing whether this is good or bad, just
noting that I/O is not a side-effect). In Clean instead of using the
(monadic) IO type, unique types are used to ensure that functions
which take the "world" as an argument use that argument in a linear
manner thereby ensuring that I/O is not a side effect but an a direct
effect.

> Do they have an equivalent to the Unix98 pread function, which
> does not change the file offset?

Haskell does not. You can look at the I/O library yourself at
http://www.haskell.org/onlinelibrary/io.html. The Clean approach
using unique types is briefly described in
http://www.cs.kun.nl/~clean/CleanExtra/report20/chapter2/s23.html#s231b
An overview of the I/O system (including some screen shots of games produced
using it) are available at :-
http://www.cs.kun.nl/~clean/About_Clean/page_modified/page_modified.html
and the full gory details of the Clean I/O library are available at
ftp://ftp.cs.kun.nl/pub/Clean/supported/ObjectIO.1.2/doc/tutorial.ps.gz

Kaz Kylheku

unread,

Aug 25, 2002, 1:44:32 PM8/25/02

to

In article <m3vg5zg...@dino.dnsalias.com>, Stephen J. Bevan wrote:
> We might have different ideas about what it means to "take the
> original file as a parameter" so I'll answer by showing the types of
> the functions in Haskell that would be used to open a file and write a
> String to the file :-
>
> openFile :: FilePath -> IOMode -> IO Handle
> hPutStrLn :: Handle -> String -> IO ()
>
> Here the IO type is the important one since this ensures that I/O is
> single threaded and represented in the types so that it is not
> possible to hide I/O inside a function which doesn't have IO in its
> type signature (note I'm not arguing whether this is good or bad, just
> noting that I/O is not a side-effect).

If output to a file is not a side-effect, then it must necessarily work by
consing up a fresh computer that is a copy of the previous one, except that
some bits are changed on the hard drive. That way any parts of the system
that have a reference to the old computer don't notice anything. :)

Stephen J. Bevan

unread,

Aug 25, 2002, 11:25:47 PM8/25/02

to

Kaz Kylheku <k...@ashi.footprints.net> writes:
> If output to a file is not a side-effect, then it must necessarily work by
> consing up a fresh computer that is a copy of the previous one, except that
> some bits are changed on the hard drive. That way any parts of the system
> that have a reference to the old computer don't notice anything. :)

You are thinking too small, the Clean argument that ensures
single-threaded behaviour isn't typically called "world" for nothing :-)

Kent M Pitman

unread,

Aug 26, 2002, 12:14:48 PM8/26/02

to

Kalle Olavi Niemitalo <k...@iki.fi> writes:

No, you get back a whole new file system that has the output done.

This should have been obvious. Think about delete-file.

;)

Kaz Kylheku

unread,

Aug 27, 2002, 3:02:30 AM8/27/02

to

Maybe quantum computing will rescue these pure programmers, assuming that the
parallel-worlds interpretation is valid.

When you change the state of the computer at all, you cons up a new reality
that proceeds in parallel with the old one.

Garbage collection can be done by collapsing wave functions.