[INFO] New array base: list.c

Leopold Toetsch

unread,

Oct 7, 2002, 5:25:31 PM10/7/02

to P6I

The current implementation of array.pmc is rather inefficient and
limited - and really slow.

So I rewrote the base routines almost from scratch and have currently a
file named list.c, which I will commit (yep cvs ci, thanks to Dan, Steve
and others, who made this possible) in the next future - if people agree.

Some features:

- list.c is based roughly on principles of intlist.c
(chunk based, fast index set/get, push/pop/shift/unshift)
- can handle <null> arrays, no chunk allocation first
- can store packed char, short, int, num, PMC* (void*), ...
- different growing policies:
- growing (min alloc 4 items/chunk -> max 1024 items/chunk
(both are defines but above are good limits in tests)
- or fixed 1024 items/chunk for arrays first accessed
like "set P0[500], I0"
- same speed as intlist, ~double speed for big # (10^6) items
- can handle sparse arrays, saving many MBs for very sparse arrays

Currently I'm testing it as an intlist substitute (via wrapper functions
built in), it runs all intlist tests + 2 more WRT sparse access.

So, if people are ok with this, i'll commit it as list.c in parrot's
root (+ some more t/pmc/intlist.t). Transition of array.pmc may then
follow later.

It's currently not intended to replace intlist.c, I don't know the
planned usage patterns for intlist (rx stack?). Some optimizations in
list.c might also improve intlist.c so that this would be a special and
therefore faster array with limited capability.
If conserving memory is an issue, list.c is much more efficient for very
small and very big arrays. We will see.

I know, that Peter is working on a similar array substitute, so I
announce the status of my list.c - and when he has a better array, we
can always change to it. Or we take the best things out of the two and
improve it even more.

Comments welcome,
leo

Leon Brocard

unread,

Oct 8, 2002, 5:06:36 AM10/8/02

to perl6-i...@perl.org

Leopold Toetsch sent the following bits through the ether:

> So I rewrote the base routines almost from scratch and have currently a
> file named list.c

I for one am confused as to the number of array-like classes in
Parrot. What is the difference between list and array? Or intlist? Or
multiarray? Perhaps replacing one of these would be better?

Leon
--
Leon Brocard.............................http://www.astray.com/
scribot.................................http://www.scribot.com/

.... Useless invention no. 404: Breathable space suit

Leopold Toetsch

unread,

Oct 8, 2002, 6:44:37 AM10/8/02

to Leon Brocard, perl6-i...@perl.org

Leon Brocard wrote:

> Leopold Toetsch sent the following bits through the ether:
>
>
>>So I rewrote the base routines almost from scratch and have currently a
>>file named list.c
>>
>
> I for one am confused as to the number of array-like classes in
> Parrot. What is the difference between list and array? Or intlist? Or
> multiarray? Perhaps replacing one of these would be better?

List is by itself not a class, but the working engine, managing all the
array related functions (indexed access, push/pop/shift/unshift ...)

On top of this there will be the *array classes, providing only a
wrapper for setting the wanted data_type and calling the appropriate
function in list.c.

- array.pmc does strict checkin of array dimensions, PMC* data
- perarray.pmc does autogrow arrays, PMC *data
- multiarray will just recalc the array index and call list's assign/get
- intlist.pmc stores packed int's

Currently array, multiarray and intlist have all there own buffer
management. This will all be replaced (for intlist.c s. remarks at the
first announcement) by functions in list.c.

When we come to packed arrays, it would be simpler, to extend the "new
..array" parameter to take a type, then to provide byte_array, int_array
and so on. The "new .Array, size" syntax may not be enough, e.g. to
discern an array from taking a (ptr*) from one taking an int.

> Leon

leo

Steve Fink

unread,

Oct 8, 2002, 12:13:57 PM10/8/02

to Leopold Toetsch, P6I

On Mon, Oct 07, 2002 at 11:25:31PM +0200, Leopold Toetsch wrote:
> So, if people are ok with this, i'll commit it as list.c in parrot's
> root (+ some more t/pmc/intlist.t). Transition of array.pmc may then
> follow later.
>
> It's currently not intended to replace intlist.c, I don't know the
> planned usage patterns for intlist (rx stack?). Some optimizations in
> list.c might also improve intlist.c so that this would be a special and
> therefore faster array with limited capability.
> If conserving memory is an issue, list.c is much more efficient for very
> small and very big arrays. We will see.

Sounds good to me. I originally wrote, and am currently using, intlist
for regular expressions. (The stuff checked into languages/regex
generates code that uses it, although I haven't committed the hooks to
allow languages/perl6 to use languages/regex instead of its native
code.) The access pattern for regexes is very simple: it only uses it
as a stack. It *might* be useful to be able to pop off multiple
entries at once (or equivalently, set the size of the array/list), and
it might be useful to be able to append another array onto the end.
But that's it. So I know of no reason for intlist.c to be kept
separate; I'm perfectly happy having it replaced by list.c.

Leopold Toetsch

unread,

Oct 8, 2002, 2:24:31 PM10/8/02

to Steve Fink, P6I

Steve Fink wrote:

> On Mon, Oct 07, 2002 at 11:25:31PM +0200, Leopold Toetsch wrote:

>>If conserving memory is an issue, list.c is much more efficient for very
>>small and very big arrays. We will see.
>>
>
> Sounds good to me. I originally wrote, and am currently using, intlist
> for regular expressions. (The stuff checked into languages/regex
> generates code that uses it, although I haven't committed the hooks to
> allow languages/perl6 to use languages/regex instead of its native
> code.) The access pattern for regexes is very simple: it only uses it
> as a stack.

.... where depth depends on the backtracking behaviour of the rule and
isn't know in advance. So the small items would be a win for probably
90% of the usage.

> ... It *might* be useful to be able to pop off multiple

> entries at once (or equivalently, set the size of the array/list),

list_delete(interp, list, idx, n_to_delete)

would do this, if (length-1-idx) == n_to_delete. But just adjusting the
array length is probably simpler.

list_delete isn't ready yet, but the infrastructure is almost finished.

> ... and

> it might be useful to be able to append another array onto the end.

These are just list_assigns in a loop.

> But that's it. So I know of no reason for intlist.c to be kept
> separate; I'm perfectly happy having it replaced by list.c.

Fine, $code_duplication-- - soon.

leo

Leopold Toetsch

unread,

Oct 10, 2002, 9:58:58 AM10/10/02

to P6I

Follow up to my announce + some additional notes.

list.c checked in + intlist fix + 3 more tests.

Leopold Toetsch wrote:

> The current implementation of array.pmc is rather inefficient and
> limited - and really slow.
>
> So I rewrote the base routines almost from scratch and have currently a
> file named list.c, which I will commit (yep cvs ci, thanks to Dan, Steve
> and others, who made this possible) in the next future - if people agree.
>
> Some features:
>
> - list.c is based roughly on principles of intlist.c
> (chunk based, fast index set/get, push/pop/shift/unshift)
> - can handle <null> arrays, no chunk allocation first
> - can store packed char, short, int, num, PMC* (void*), ...
> - different growing policies:
> - growing (min alloc 4 items/chunk -> max 1024 items/chunk
> (both are defines but above are good limits in tests)
> - or fixed 1024 items/chunk for arrays first accessed
> like "set P0[500], I0"
> - same speed as intlist, ~double speed for big # (10^6) items
> - can handle sparse arrays, saving many MBs for very sparse arrays

- list_delete(..., idx, n_items)
- list_insert(..., idx, n_items) ... make room for n_items at idx

These should it make fairly simple to implement splice().

As mentioned in the file, list.c can emulate intlist.c, so intlist.c
users - please - give it a try: just compile/link list instead of intlist.

Status:
- Runs all intlist tests
- insert/delete is only barely tested, this needs definitely more,
especially combined and mixed with other operations.

If people agree, we could start to base other array classes on list.c,
though I'd rather do classes cleanup and keyed access first.

Comments welcome,
leo

Josef Hook

unread,

Oct 11, 2002, 7:41:32 AM10/11/02

to Leopold Toetsch, Leon Brocard, perl6-i...@perl.org

On Tue, 8 Oct 2002, Leopold Toetsch wrote:

> Leon Brocard wrote:
>
> > Leopold Toetsch sent the following bits through the ether:
> >
> >
> >>So I rewrote the base routines almost from scratch and have currently a
> >>file named list.c
> >>
> >
> > I for one am confused as to the number of array-like classes in
> > Parrot. What is the difference between list and array? Or intlist? Or
> > multiarray? Perhaps replacing one of these would be better?
>
>
> List is by itself not a class, but the working engine, managing all the
> array related functions (indexed access, push/pop/shift/unshift ...)
>
> On top of this there will be the *array classes, providing only a
> wrapper for setting the wanted data_type and calling the appropriate
> function in list.c.
>
> - array.pmc does strict checkin of array dimensions, PMC* data
> - perarray.pmc does autogrow arrays, PMC *data
> - multiarray will just recalc the array index and call list's assign/get

Very good, that was what i was going for but i ran out of time :-)

/Josef

Josef Hook

unread,

Oct 11, 2002, 7:38:54 AM10/11/02

to Leopold Toetsch, P6I

> - can handle sparse arrays, saving many MBs for very sparse arrays

What is the complexity on the algoritm?

/Josef

Leopold Toetsch

unread,

Oct 11, 2002, 8:11:16 AM10/11/02

to Josef Hook, P6I

Josef Hook wrote:

>
>>- can handle sparse arrays, saving many MBs for very sparse arrays
>>
>
> What is the complexity on the algoritm?

Here is a snippet docu, from my recent changes, not checked in yet:

[ get_chunk ... locate List_chunk, calc index in chunk ]
....

* The scheme of operations is:
*
* if all_chunks_are_MAX_ITEMS
* chunk = chunk_list[ idx / MAX_ITEMS ]
* idx = idx % MAX_ITEMS
* done.
*
* chunk = first
* repeat
* if (index < chunk->items)
* done.
*
* if (index >= items_in_chunk_block)
* index -= items_in_chunk_block
* chunk += chunks_in_chunk_block
* continue
*
* calc chunk and index in this block
* done.
*
* One chunk_block consists of chunks of the same type: fixed, growing
* or other. So the time to look up a chunk doesn't depend on the array
* length, but on the complexity of the array. rebuild_chunk_list tries
* to reduce the complexity, but may fail, if you e.g. do a prime sieve
* by actually list_delet-ing the none prime numbers.
*
* The complexity of the array is how many different chunk_blocks are
* there. They come from:
* - initially fixed: 1
* - initially growing: 2
* - first unshift: 1 except for initially fixed arrays
* - insert: 1 - 3
* - delete: 1 - 2
* - sparse hole: 3 (could be 2, code assumes access at either end now)

I hope, that answers the question.
In all tests it's as fast as intlist or much (10x) faster on sparse arrays.

A lookup if 10^6 items did last ~0.3s on my Athlon 800.

> /Josef

leo