# Collecting like-labelled sublists of a list

44 views

### Cortez

Jul 29, 2008, 1:29:38 PM7/29/08
to
I need to traverse a list of lists, where each sublist is labelled by
a number, and collect together the contents of all sublists sharing
the same label. So if I have the list -

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce -

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

I do this with the following -

(defun test (list)
(loop for j in list
for index = (first j)
for k = (rest j)
with indices = nil
if (not (member index indices))
do (pushnew index indices)
and collect k into res
else
do (nconc (nth index res) k)
finally (return res)))

I suspect that there is a more efficient and elegant way of doing
this, however. Any suggestions welcome.

Brief background: this is part of a program I've written for reading
data from SDIF files, a binary format which stores sound description
data. The labelled lists represent partials in spectral analysis data
(partial-index, time, frequency).

### Cortez

Jul 29, 2008, 1:44:41 PM7/29/08
to

Actually the function uses ASSOC instead of NTH, because the labels
themselves will not necessarily be integers -

(defun test (list)
(loop for j in list
for index = (first j)
for k = (rest j)
with indices = nil
if (not (member index indices))
do (pushnew index indices)

and collect j into res
else
do (nconc (assoc index res) k) ; ASSOC instead of NTH
finally (return res)))

### Cortez

Jul 29, 2008, 2:24:02 PM7/29/08
to

To be more precise (if that helps), I'm wondering if there's a way of
doing this without having to build up a list of the indices (labels)
and using membership/non-membership of this list as the test for
whether we have encountered a new index or not.

Thanks for any ideas.

### Alberto Riva

Jul 29, 2008, 2:31:30 PM7/29/08
to

If the number of labels can be large, I would use a hash table:

(defun test (list)
(let ((ht (make-hash-table :test #'equal)))
(dolist (j list)
(dolist (k (rest j))
(push k (gethash (car j) ht))))
(loop
for v being the hash-value of ht
collect v)))

Of course you get the sublists in random order, but it wasn't clear from
your message whether that's an issue or not...

Alberto

### Alberto Riva

Jul 29, 2008, 2:33:43 PM7/29/08
to

I posted my reply before seeing this, but my solution takes care of this
too: GETHASH automatically creates a new entry in the hash table for
each new key, so you don't have to worry about keeping a list of

Alberto

### Cortez

Jul 29, 2008, 3:42:27 PM7/29/08
to

Thanks Alberto. I had considered using a hash. Unfortunately the order
is important, though. To clarify still further, the final list of
lists is passed as a sequence of coordinate pairs (X1 Y1 X2 Y2... Xn
Yn) to a drawing function in McClim, which outputs them to a graphical
display pane. The main problem I have is that it is not known prior to
actually reading in all the data from an SDIF file how many partials
it contains. This information is not contained in the file header. So
I can't, for example, create an array with a row for each partial and
then just read in all the data. So I want some way of building up a
list (or vector) for each partial as I'm reading through the data,
which is what the function I originally suggested does.

### Volkan YAZICI

Jul 29, 2008, 4:59:51 PM7/29/08
to

CL-USER> (let ((table (make-hash-table)))
(mapcar
(lambda (key) (gethash key table))
(mapcar
(lambda (item &aux (key (first item)))
(setf (gethash key table)
(nconc (gethash key table) (rest item)))
key)
'((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k
l)
(4 m n) (2 o p) (4 q r) (5 s t)))))
((A B) (C D I J) (E F K L O P) (G H) (C D I J) (E F K L O P) (M N Q
R)
(E F K L O P) (M N Q R) (S T))

Regards.

### Alberto Riva

Jul 29, 2008, 5:28:11 PM7/29/08
to

Sorry, do you mean the order of the labels, or the order of the elements
inside each sublist? I though it was the former (and if this is the
case, see the function below[*]), but reading what you say about
interpreting your result as a sequence of coordinates, I'm starting to
think it's the latter. In this case you just need to REVERSE each
sublist, since the elements are pushed into them.

[*] If you want to preserve the order of the labels, you could still use
a hash table to store the values and keep a separate list of labels in
order of appearance:

(defun test (list)
(let ((ht (make-hash-table :test #'equal))

(labels nil)
(result nil))
(dolist (j list)
(pushnew (car j) labels :test #'equal)

(dolist (k (rest j))
(push k (gethash (car j) ht))))

(dolist (j labels)
(push (gethash j ht) result))
result))

Alberto

### Cortez

Jul 29, 2008, 6:36:28 PM7/29/08
to

Of course, yes - sorry, got confused by the data I was testing it
with. :)
The ordering of the final collected lists is not important, since
they're
just passed to the drawing function separately. I might want to
preserve the order
eventually, once I've done a bit more testing. Thank you very much for
your
suggestions.

My main concern is efficiency, however, since the files I'm dealing
with contain a
huge amount of data (a 1MB SDIF file might contain thousands of
partials). Timing the
various (compiled) versions of TEST with realistic data in ACL/Slime
gives -

My original version -

; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:
; 76 cons cells, 0 other bytes, 0 static bytes

Alberto -

; cpu time (non-gc) 16 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 16 msec user, 0 msec system
; real time 15 msec
; space allocation:
; 691 cons cells, 944 other bytes, 0 static bytes

Volkan -

; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:
; 327 cons cells, 944 other bytes, 0 static bytes

My 'naive' assumption (based on only limited experience profiling and
optimizing code)
is that my version would therefore be more efficient. True?

Jul 29, 2008, 6:56:33 PM7/29/08
to
På Wed, 30 Jul 2008 00:36:28 +0200, skrev Cortez
<relati...@hotmail.co.uk>:

This doesn't really tell you anything.
You need to test it on a reasonable size dataset.
Hashing doesn't really come into effect before you have a large amount of
elements so it scores unreasonably low here compared to what it would on a
large dataset.

--------------

### Cortez

Jul 29, 2008, 6:59:01 PM7/29/08
to
On Jul 29, 11:56 pm, "John Thingstad" <jpth...@online.no> wrote:
> På Wed, 30 Jul 2008 00:36:28 +0200, skrev Cortez
> <relativef...@hotmail.co.uk>:

Thanks John, I suspected as much.

### Thomas A. Russ

Jul 29, 2008, 10:11:00 PM7/29/08
to
Cortez <relati...@hotmail.co.uk> writes:

> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
> (loop for j in list
> for index = (first j)
> for k = (rest j)
> with indices = nil
> if (not (member index indices))
> do (pushnew index indices)
> and collect k into res
> else
> do (nconc (nth index res) k)
> finally (return res)))

The one comment that I have about this particular code is that you end
up destructively modifying the original list structure that you start
with. Depending on the precise application that may or may not be a
good idea.

As long as you don't need any of the original structure, you will be
fine. But if the input structure needs to be preserved, you are in
trouble.

To see the effect, try the following:

(defvar *input* (copy-tree '((0 a b)

(1 c d)
(2 e f)
(3 g h)
(1 i j)
(2 k l)
(4 m n)
(2 o p)
(4 q r)

(5 s t))))

;; I use copy-tree to avoid problems with destructively modifying
;; constant list structure.

(test *input*)
=> ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))

*input*
=> ((0 A B) (1 C D I J) (2 E F K L O P) (3 G H) (1 I J)
(2 K L O P) (4 M N Q R) (2 O P) (4 Q R) (5 S T))

Note in particular the changes to the first occurences of the lists
headed by 1, 2 and 4.

--
Thomas A. Russ, USC/Information Sciences Institute

### Cortez

Jul 30, 2008, 12:45:07 AM7/30/08
to
On Jul 30, 3:11 am, t...@sevak.isi.edu (Thomas A. Russ) wrote:

Yes, thanks for pointing that out - I should have mentioned that the
destructive modification is not, in fact, an issue.

### Volkan YAZICI

Jul 30, 2008, 1:45:41 AM7/30/08
to
On Jul 30, 1:36 am, Cortez <relativef...@hotmail.co.uk> wrote:
> My original version -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc) 0 msec user, 0 msec system
> ; cpu time (total) 0 msec user, 0 msec system
> ; real time 0 msec
> ; space allocation:
> ; 76 cons cells, 0 other bytes, 0 static bytes
>
> Alberto -
>
> ; cpu time (non-gc) 16 msec user, 0 msec system
> ; cpu time (gc) 0 msec user, 0 msec system
> ; cpu time (total) 16 msec user, 0 msec system
> ; real time 15 msec
> ; space allocation:
> ; 691 cons cells, 944 other bytes, 0 static bytes
>
> Volkan -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc) 0 msec user, 0 msec system
> ; cpu time (total) 0 msec user, 0 msec system
> ; real time 0 msec
> ; space allocation:
> ; 327 cons cells, 944 other bytes, 0 static bytes

I don't think your original version will keep its speed stable --
because of (member index indices) searches -- as input grows. Maybe
you should also tell us more about the average # of labels.

Regards.

Jul 30, 2008, 10:39:00 AM7/30/08
to
* Cortez <712195e4-2e38-4b2f...@b1g2000hsg.googlegroups.com> :
Wrote on Tue, 29 Jul 2008 11:24:02 -0700 (PDT):

|> (defun test (list)
|> (loop for j in list
|> for index = (first j)
|> for k = (rest j)
|> with indices = nil
|> if (not (member index indices))
|> do (pushnew index indices)
|> and collect j into res
|> else
|> do (nconc (assoc index res) k) ; ASSOC instead of NTH
|> finally (return res)))
|
| To be more precise (if that helps), I'm wondering if there's a way of
| doing this without having to build up a list of the indices (labels)
| and using membership/non-membership of this list as the test for
| whether we have encountered a new index or not.

You can get by without building indices and just using ASSOC (which you
cannot avoid):

(defun cortez-group (list) ; Destroys LIST!
(let (result)
(dolist (el list)
(let ((entry (assoc (car el) result)))
(if entry
(rplacd entry (nconc (cdr entry) (cdr el)))
(push el result))))
(nreverse (mapcar #'cdr result))))

* (setq \$a '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)
(2 k l) (4 m n) (2 o p) (4 q r) (5 s t)))
* (cortez-group \$a)

=> ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))

If you want the value to be sorted by label, consider sorting RESULT
:KEY #'CAR.

--

### Cortez

Jul 30, 2008, 2:25:52 PM7/30/08
to

Hi Volkan,

Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
labels. That's about the size of file I typically work with. But the
situation is complicated by the fact that within the file itself these
partials are distributed over a set of time-ordered matrices, which
have to be traversed in order to extract the data. Each matrix
contains information on every partial located at that point (each
matrix also has a header). The information itself (frequency,
amplitude, phase, etc - although I'm only interested in frequency at
the moment) is typically in the form of single or double floats. For
example, here are three matrices (which I model as arrays) containing
data extracted from a single clarinet note. The label is in the left-
most column, followed by frequency, amplitude and other data which I
won't elaborate on here (relating to time-offset and noise content.
The time of the matrix itself is contained in the matrix header). So
you can see how the partials (0 to 6 in this example) are distributed
across the matrices -

#2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
0.8507826862315859d0
0.9633105769790011d0 8.735621320690784d-4)
(1.0d0 2799.767798804896d0 2.9758011552006322d-5
3.45542871985445d0
0.9999134899562377d0 7.948914515372436d-4)
(2.0d0 556.677039133396d0 1.7342238369471644d-5
3.5441699661680883d0 0.0d0
0.0013469001214222072d0))
#2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
3.7002160311387104d0
0.8199124994997562d0 4.790926227113963d-4)
(1.0d0 2660.5866387929855d0 2.5368686049752616d-5
3.5907080654740575d0
0.9996524472597946d0 3.0160728537549295d-4)
(2.0d0 606.6191777974843d0 6.739337893175857d-6
2.3996345665170553d0 0.0d0
0.0011529762653688502d0)
(3.0d0 2324.4893423205413d0 4.0769660018709746d-5
3.41151701297919d0
0.9974054018288645d0 3.8066262505557277d-4)
(4.0d0 5335.909767981719d0 3.5703443068263075d-5
3.111386950381825d0
0.9988603457041186d0 4.2724354268293147d-4))
#2A((0.0d0 1201.692599868909d0 3.140013470961943d-5
2.6591665451516757d0
0.6600595944811739d0 7.04389350218825d-4)
(3.0d0 2345.9740698897076d0 2.501488942774825d-5
3.3084674908933707d0
0.9955756776543997d0 3.5830861408908156d-4)
(4.0d0 5335.896889017415d0 3.63802335700116d-5 6.002182744750073d0
0.9999136282420076d0 1.5051639945999097d-4)
(5.0d0 1963.8199112992056d0 2.1833696069379764d-5
3.411731878197788d0
0.9837995802145886d0 0.0012953992561072873d0)
(6.0d0 3466.815966572441d0 4.645427928279146d-5
0.4173643804422096d0
0.9949535597520733d0 5.296405453024079d-4))...

The project I'm working on is a personal Lisp library for reading and
writing SDIF files (it's also a general exercise for me in writing an
application). I already have a working version, but I want to make it
much more efficient. It's not that slow (and in SBCL is actually very
fast), but now I'm trying to write a simple graphical interface, in
which the data is visualized. I'm experimenting with McClim and ACL.
Both get quite slow for large files. At the moment I'm just reading in
all the data into a slot in one of the graphical objects, and then
passing it to the various drawing functions I use. What I want,
ideally, is to do the drawing as I'm parsing the file.

I model the internal structure of an SDIF file through various
classes, instances of which are used to hold the data (SDIF is a
binary format). My current method of extracting partials is to read
all of the matrices into arrays (see above) then iterate over these
for each partial, starting from 0, until I've found them all. This is
horribly inefficient, of course. So the original function I posted was
a model (admittedly simplistic) for the kind of thing I want to do
instead, which is to extract the partials as I'm actually reading in
the bytes.

BTW, more info on SDIF can be found at: http://recherche.ircam.fr/equipes/analyse-synthese/sdif/

### Steve Allan

Jul 30, 2008, 3:49:44 PM7/30/08
to
Cortez <relati...@hotmail.co.uk> writes:

I'm a junior lisper, so I tried a different approach mostly for my own
education.

(let ((mylist '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l)
(4 m n) (2 o p) (4 q) (5 s t)))
(ar (make-array 10 :adjustable t)))
(mapcar (lambda (l)
(setf (aref ar (first l))
(append (aref ar (first l)) (rest l)))) mylist))

My choice of 10 for the initial size of the array was arbitrary. I
chose the non-destructive append over nconc, but nconc might be ok.

--
-- Steve

### Kaz Kylheku

Jul 30, 2008, 6:14:01 PM7/30/08
to
On 2008-07-29, Cortez <relati...@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))

Note that this is basically like join of a database onto itself,
where the number is the join key.

> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
> (loop for j in list
> for index = (first j)
> for k = (rest j)
> with indices = nil
> if (not (member index indices))
> do (pushnew index indices)
> and collect k into res
> else
> do (nconc (nth index res) k)
> finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.

If you want efficiency for situations when such lists are large,
you probably want to use hashing to implement the join.

If the labels are within a small numeric range, like 0 to 99, you could use an
array instead of hashing:

(defun join (list)
(let ((array (make-array '(100))))
(dolist (sublist list (coerce (remove nil array) 'list))
(destructuring-bind (numeric-label &rest items) sublist
(setf (aref array numeric-label)
(append (aref array numeric-label) items))))))

Change APPEND to NCONC for the destructive version.

> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).

So the labels are bounded, since they represent partials, and there are only so
many partials in the spectral analysis.

Message has been deleted

Jul 30, 2008, 7:21:11 PM7/30/08
to
På Wed, 30 Jul 2008 04:11:00 +0200, skrev Thomas A. Russ
<t...@sevak.isi.edu>:

>
> To see the effect, try the following:
>
> (defvar *input* (copy-tree '((0 a b)
> (1 c d)
> (2 e f)
> (3 g h)
> (1 i j)
> (2 k l)
> (4 m n)
> (2 o p)
> (4 q r)
> (5 s t))))
>
> ;; I use copy-tree to avoid problems with destructively modifying
> ;; constant list structure.
>
> (test *input*)
> => ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))
>
> *input*
> => ((0 A B) (1 C D I J) (2 E F K L O P) (3 G H) (1 I J)
> (2 K L O P) (4 M N Q R) (2 O P) (4 Q R) (5 S T))
>
> Note in particular the changes to the first occurences of the lists
> headed by 1, 2 and 4.
>

If your files are as big as you mentioned 25 Mb or more a buffered
approach sounds better unless you have a pressing need to keep it all in
memory.

--------------

### Steve Allan

Jul 30, 2008, 7:28:42 PM7/30/08
to

> * Steve Allan <uljzj2...@attachmate.com> :
> Wrote on Wed, 30 Jul 2008 12:49:44 -0700:

> | I'm a junior lisper, so I tried a different approach mostly for my own
> | education.
> |
> | (let ((mylist '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l)
> | (4 m n) (2 o p) (4 q) (5 s t)))
> | (ar (make-array 10 :adjustable t)))
> | (mapcar (lambda (l)
> | (setf (aref ar (first l))
> | (append (aref ar (first l)) (rest l)))) mylist))
> |
> |
> | My choice of 10 for the initial size of the array was arbitrary. I
> | chose the non-destructive append over nconc, but nconc might be ok.
>

> This is a correct approach. Note it is the almost identical to what
> [the experienced] Kaz suggests in this thread! Like he says, it will
> work well "If the labels are within a small numeric range, like 0 to 99"
>
> [Even if the label bound is not known but is known to be small, as you
> have declared AR adjustable, if the label I is greater than (LENGTH AR)
> you can resize AR before setting the Ith element.]
>

I realize now that I don't understand how adjustable arrays work. I
assumed the expanding would happen automagically, but after a quick
test I see that it doesn't. So my approach doesn't work the way I had
intended. I'll have to read up on adjustable arrays a bit.

> However in this case the label may not be useful to index the array, as

> elsewhere in the thread Cortez has said:
>
>> Actually the function uses ASSOC instead of NTH, because the labels
>> themselves will not necessarily be integers

Ah, that does change things.

--
-- Steve

### Thomas A. Russ

Jul 30, 2008, 7:10:24 PM7/30/08
to

...

Well, it seems that one of the key questions is whether there are any
other constraints on the label that you are using. All of the values
are doubles, but it seems from the small sample given above that the
label values are all integer values.

Is this a correct assumption?

If so, and if the number of labels is reasonably bounded, then you can
take a two-pass buffering approach that should be reasonably efficient.

Is it reasonable to assume that the number of items you will need to
collect for each label is relatively small? If so, then a simple
solution to accumulating the values will work. Otherwise, something
more complicated will be needed.

Is it also the case that the labels will be in ascending order without
any missing values?

> So the original function I posted was
> a model (admittedly simplistic) for the kind of thing I want to do
> instead, which is to extract the partials as I'm actually reading in
> the bytes.

Based on some guesses to the questions above, namely that the labels are
non-negative integers, not many total values per label, a bounded number
of labels and consecutive labels, I created the following. Actually,
consecutive is not a strict requirement, but you might end up with some
empty entries that way.

You set up a vector of maximum size, and then just assign the incoming
values into the correct bucket. In some ways this is a short-cut of a
hash-table where we assume that the label value itself constitutes the
hash-key into our collision-free vector. [It is related to the O(n)

(defconstant maximum-label-value 1024) ;; Or whatever is reasonable.

(defun collate-data (data)
(let ((results (make-array (list maximum-label-value)))
(max-label -1))
(dolist (datum data)
(let ((label (truncate (first datum)))
(values (rest datum)))
(setf (aref results label)
(nconc (aref results label) values))
(setf max-label (max max-label label))))
(values results max-label)))

This will return a vector with the data in place, and an indication of
how many elements are present.

CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)

(1 i j) (2 k l) (4 m n) (2 o p)

(4 q r) (5 s t))))
((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))

CL-USER> (time (collate-data *input*))

; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:

; 1 cons cell, 4,112 other bytes, 0 static bytes

#((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
5

> BTW, more info on SDIF can be found at:
> http://recherche.ircam.fr/equipes/analyse-synthese/sdif/

Wow. Looks complicated....

### Cortez

Jul 30, 2008, 10:21:35 PM7/30/08
to
On Jul 31, 12:10 am, t...@sevak.isi.edu (Thomas A. Russ) wrote:

The data within each matrix has to be of the same binary type -
they're generally either 32- or 64-bit floats.

> If so, and if the number of labels is reasonably bounded, then you can
> take a two-pass buffering approach that should be reasonably efficient.
>
> Is it reasonable to assume that the number of items you will need to
> collect for each label is relatively small? If so, then a simple
> solution to accumulating the values will work. Otherwise, something
> more complicated will be needed.
>
> Is it also the case that the labels will be in ascending order without
> any missing values?

Yes, I just want to extract frequency data, contained in the second
column. And yes again, they're in ascending order. There might be
occasional small gaps, though, when a partial disappears from the
stream of matrices only to reappear a few matrices later.

Thank you Thomas, that's a neat suggestion.

> > BTW, more info on SDIF can be found at:
> >http://recherche.ircam.fr/equipes/analyse-synthese/sdif/
>
> Wow. Looks complicated....

I model the various internal SDIF structures using just a few classes
and binary types. I'm only really interested in certain types of SDIF
files, namely 1TRC and RBEP types, which contain sinusoidal track
data. As it stands my library can handle them quite well. It's mainly
because I want to extract time and frequency values and convert them
into xy coordinates (for graphical display) that I'm interested in
parsing the files in a new (and more efficient) way.

### Cortez

Aug 7, 2008, 1:07:16 PM8/7/08
to
On Jul 31, 3:21 am, Cortez <relativef...@hotmail.co.uk> wrote:
> On Jul 31, 12:10 am, t...@sevak.isi.edu (Thomas A. Russ) wrote:
>
>
>
> > Cortez <relativef...@hotmail.co.uk> writes:
>
> > > Hi Volkan,
>
> > > Well a 1MBSDIFfile might contain about 3000 partials, so that's 3000
> > > BTW, more info onSDIFcan be found at:> I model the various internalSDIFstructures using just a few classes

> and binary types. I'm only really interested in certain types ofSDIF
> files, namely 1TRC and RBEP types, which contain sinusoidal track
> data. As it stands my library can handle them quite well. It's mainly
> because I want to extract time and frequency values and convert them
> into xy coordinates (for graphical display) that I'm interested in
> parsing the files in a new (and more efficient) way.

I am happy to say that I have now solved this problem. I am using a
hash table stored in a global variable, into which I read the partials
as I'm reading the matrices from the file. This has reduced the
loading time for a large (c.1MB) SDIF file from more than 3 minutes
(yes!) to less than 7 seconds. This is for loading in to the graphical
interface in ACL - the non-GUI version is even faster. Once I've
optimized and done the appropriate type-declarations I'm sure I can
get the time down even further.

Many thanks for all the very helpful suggestions and advice!

### xah...@gmail.com

Aug 7, 2008, 2:15:26 PM8/7/08
to

For some excursion, here's how one'd do it in Mathematica.

define the list:

mylist={0[a,b],1[c,d],2[e,f],3[g,f],1[i,j],2[k,l],4[m,n],2[o,p],4[q,r],
5[s,t]}

then do this:

Sort@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}

output is:

{0[a, b], 1[c, d, i, j], 2[e, f, k, l, o, p], 3[g, f], 4[m, n, q, r],
5[s, t]}

if you want the result cleaned up so that the integer labels are
removed, do like this

result /. _Integer[b___] -> {b}

-----------------------------------

Explanation:

The sort@mylist is syntactically equivalent to Sort[mylist]. It just
sorts it.
The result is:

{0[a, b], 1[c, d], 1[i, j], 2[e, f], 2[k, l], 2[o, p], 3[g, f], 4[m,
n],
4[q, r], 5[s, t]}

The “//. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}”

means use a pattern matching so that if ajacent elements has the same
head, merge them into one.

the shortcut syntax for structural transformation used above is this:

myExpr //. myPattern -> myNewForm

The “myExpr” is any expression. The “myPattern” is any structural
pattern (i.e. like regex except it work on list structures and
datatypes). The “myNewForm” is like the regex's replacement string.

The syntax
“myExpr //. myPattern -> myNewForm”
can also be written in purely nested form, like this:

ReplaceRepeated[myExpr, Rule[myPattern, myNewForm]]

Now, here's some explanation on the the shortcut syntax used to match
patterns:

“_” means any single symbol.
“__” means any 1 or more symbols.
“___” means any 0 or more symbols.

“x_” means a pattern to be named “x”, so later you can refer it as
“x”.
So, “f___” means a pattern that's 0 or more symbols, and named f.
Similar for “l___”.
The “x_[a__]” means basically a expression with 1 or more elements.
The head we named “x”, and its elements we named “a”. Similar for
“x_[b__]”.

So, all together, the “{f___,x_[a__],x_[b__],l___}” just matches a
list or 0 or more length, and capture neighbors that has the same

The “{f,x[a,b],l}” just means the new form we want. Note the f,x,a,b,l
are just names we used for the captured pattern.

So now, “Sort@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}”
gives us the desired result.

Now to clean up the head that function as integer labels, we do:

result /. _Integer[b___] -> {b}

which is another expression transformation with pattern. The
“_Integer[b___]” just means any list who's head is of Integer type,
and its element we name “b”. Any expression matching it is replaced by
a list of its elements, expressed as “{b}”.

----------------------

Now, all the above may seem like weired syntax. But actually it has a
purely nested form.

For example, this whole expression:

Sort@mylist /. {f___, x_[a__], x_[b__], l___} -> {f, x[a, b], l}

is syntactically equivalent to this:

ReplaceRepeated[Sort[mylist],
Rule[List[Pattern[f, BlankNullSequence[]],
Pattern[x, Blank[]][Pattern[a, BlankSequence[]]],
Pattern[x, Blank[]][Pattern[b, BlankSequence[]]],
Pattern[l, BlankNullSequence[]]], List[f, x[a, b], l]]]

In a lisp form, it'd be:

(ReplaceRepeated (Sort mylist)
(Rule
(List
(Pattern f (BlankNullSequence))
((Pattern x (Blank)) (Pattern a BlankSequence))
((Pattern x (Blank)) (Pattern b BlankSequence))
(Pattern l (BlankNullSequence)))
(List f (x a b) l)))

That's some power of fully nested, _regular_, syntax.

For some detailed reading regarding the syntax, see:
“The Concepts and Confusions of Prefix, Infix, Postfix and Fully
Functional Notations”
http://xahlee.org/UnixResource_dir/writ/notations.html

-------------------------

Now, Qi support pattern matching in common lisp. I wonder if the above
can be translated to Qi. A functional lang such as Haskell or OCaml
would be interesting.

### Jon Harrop

Aug 7, 2008, 9:33:38 PM8/7/08
to
xah...@gmail.com wrote:
> For example, this whole expression:
>
> Sort@mylist /. {f___, x_[a__], x_[b__], l___} -> {f, x[a, b], l}
> ...

In OCaml/F#, the original list would be a list of tuples of integers and
lists and the solution is then simply:

List.sort (fun (a, _) (b, _) -> compare a b) mylist

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u

### William James

Sep 16, 2008, 9:29:32 PM9/16/08
to
On Jul 29, 12:29 pm, Cortez <relativef...@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
> (loop for j in list
> for index = (first j)
> for k = (rest j)
> with indices = nil
> if (not (member index indices))
> do (pushnew index indices)
> and collect k into res
> else
> do (nconc (nth index res) k)
> finally (return res)))

Ruby:

[[0,"a","b"],[1,"c","d"],[2,"e","f"],[3,"g","h"],[1,"i","j"],
[2,"k","l"],[4,"m","n"],[2,"o","p"],[4,"q","r"],[5,"s","t"]].
sort.inject([]){|a,b|
(a==[] or b[0] != a[-1][0]) ?
a << b : ( a[-1] += b[1..-1] ; a ) }.
map{|a| a[1..-1] }

==>[["a", "b"], ["c", "d", "i", "j"], ["e", "f", "k", "l", "o",
"p"],
["g","h"], ["m", "n", "q", "r"], ["s", "t"]]

### André Thieme

Sep 19, 2008, 2:44:47 AM9/19/08
to
William James schrieb:

Although your Ruby solution looks shorter, because the function names
only consist of a few characters your solution is longer in fact, when
looking at the program complexity, which is what really counts in my
opinion. We don’t want programs to have more complexity elements.

I did not count the data structure itself, because in that case Ruby
would easily have lost, because of the so many strings.
While '((1 a b) (2 c d)) are only 11 points we have
[[1, "a", "b"], [2, "c", "d"]] already with 18 points.

Both, the Lisp code and your Ruby code (without counting the complexity
points of your data structure) have 46 points. But the Lisp code wraps
this into a function.
So, in your Ruby code does less. If we add def test(list) and end
we already have a more complex Ruby version.

André
--