Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[#39063 and #40066] boolean arrays

4 views
Skip to first unread message

Karl Forner

unread,
Oct 4, 2006, 7:49:50 PM10/4/06
to Internals List
Hello,

I need some advices.
I've worked on fixedbooleanarray and resizablebooleanarray.
From #40066 it is said that both need to be rewritten.

So I've cleaned up fixedbooleanarray that should be a lot cleaner, somewhat
faster and more understandable, and I've added some tests. But of course
because resizablebooleanarray extends it,
it breaks its current implementation.

In parallel, I've checked the #39063 bug that states that ResizableBooleanArray
uses 64 bytes per bit of information.
I've found the reason why (just a silly #define bug), but nonetheless the
implementation is deficient.

For example

pmc1 = new ResizableBooleanArray
pmc1 = 1
pmc1 = 2
pmc1 = 3

makes the following allocation calls:

# mem_sys_allocate_zeroed(64) ==> (pmc=1, preallocation of 64 bytes, ok)
# mem_sys_realloc(64) ==> (pmc=2)we have 63 bytes left, and we realloc ??
# mem_sys_realloc(64) ==> (pmc=3) same problem

So in my opinion too this pmc should be rewritten. I'm ready to do it, based
on my fixedbooleanarray implementation,
but before doing it I need some answers :

Whare the requirements/constraints of a ResizableBooleanArray ? e.g are
unshift to be less frequent that shift ?
What's a typical usage test case ?
Should I keep the actual implementation concept : allocating by chunks of 64
bytes ?

Waiting for your input...

Karl Forner

Bernhard Schmalhofer

unread,
Oct 5, 2006, 2:10:56 PM10/5/06
to Karl Forner, Internals List
Karl Forner schrieb:

>
> So in my opinion too this pmc should be rewritten. I'm ready to do it,
> based
> on my fixedbooleanarray implementation,
> but before doing it I need some answers :
Yes, I've always why ResizableBooleanArray extends FixedBooleanArray and why
FixedBooleanArray is not simply a special case of ResizableBooleanArray.

pdd17_basic_types.pod says nothing about the implementation, so whatever
works should
be OK.
> What are the requirements/constraints of a ResizableBooleanArray ? e.g

> are
> unshift to be less frequent that shift ?
> What's a typical usage test case ?
> Should I keep the actual implementation concept : allocating by chunks
> of 64
> bytes ?

Sorry, no answers to that, just more questions.


Is there a real difference between boolean arrays, big integers and
binary data?
Are the above really different from a STRING ?
If not they could become the same thing, or at least share a common
implementation.

Of course it would be nice to have all the bitwise and arithmetic
methods available.
Can the C-code of the Perl5 Module Bit::Vector reused for that?

Another item on my wishlist is the interaction with the outside world.
It would be nice
to be able to pass a BooleanArray|BigInt|Binary PMC directly to
image manipulation, math and encryption libraries. So something like
morphing to and from UnmanagedStruc would be cool.

Just my $0.02,
Bernhard


Leopold Toetsch

unread,
Oct 5, 2006, 3:11:04 PM10/5/06
to perl6-i...@perl.org
Am Donnerstag, 5. Oktober 2006 01:49 schrieb Karl Forner:
>
> Whare the requirements/constraints of a ResizableBooleanArray ? e.g are
> unshift to be less frequent that shift ?

shift and unshift are both more unlikely than push/pop I presume. OTOH if a
user wants a bit queue, you have to deal with both ends ;)

> Should I keep the actual implementation concept : allocating by chunks of
> 64 bytes ?

64 bytes is any arbitray number but probably fine. It ought to be some power
of 2 though and smaller than 16 bytes doesn't make sense.

The basic problem with Resizable*Array is: we'd need 3 fields for
book-keeping:

first_elem 0 unless shift/unshift was done
last_elem_p1 elements = last_elem_p1 - first_elem
allocated amount of allocated memory

The latter is of course just an optimization, but we can't afford resizing on
each size change.

Current PMC layout only allows for 2 integer fields: PMC_int_val and
PMC_int_val2.

ResizableBooleanArray tried to implement this strategy:

1) 'allocated' is always the minimal n*64 bytes (aka n*CHNUK_SIZE)
2) if there's more room than CHUNK_SIZE on either side of the array,
reallocation and moving of bits happens to fullfill again constraint 1.
3) and of course, if more bits are needed, reallocation must be done.

That's it basically,
leo

Karl Forner

unread,
Oct 5, 2006, 6:10:38 PM10/5/06
to Bernhard Schmalhofer, Leopold Toetsch, Internals List
> Yes, I've always why ResizableBooleanArray extends FixedBooleanArray and
> why
> FixedBooleanArray is not simply a special case of ResizableBooleanArray.


Because a FixedBooleanArray is simpler, so that it may use less memory and
be implemented more efficiently I suppose.


Is there a real difference between boolean arrays, big integers and
> binary data?
> Are the above really different from a STRING ?
> If not they could become the same thing, or at least share a common
> implementation.


I don't know for the fixed one, because it's so simple that you do not need
a backend.
But I suppose that String and the Resizable could use a common vector
implementation.

Of course it would be nice to have all the bitwise and arithmetic
> methods available.
> Can the C-code of the Perl5 Module Bit::Vector reused for that?


I suppose it could. It comes with the same Licences as Perl (Artistic, GPL
and LGPL) so I guess there's no problem at this level.
I believe that Bigints already use an external library if available :
libGMP. Is there a particular reason not to include it in the parrot
distribution.
Any Licensing or portability issues ?


The only problem I see is that if we don't add other methods to
ResizableBooleanArray, it would be overkill just using the BitVector library

to store an array, and maybe difficult to debug.
See the list of functions it provides :

Version() Word_Bits() Long_Bits()
new() new_Hex() new_Bin()
new_Dec() new_Enum() Shadow()
Clone() Concat() Concat_List()
Size() Resize() Copy()
Empty() Fill() Flip()
Primes() Reverse() Interval_Empty()
Interval_Fill() Interval_Flip() Interval_Reverse()
Interval_Scan_inc() Interval_Scan_dec() Interval_Copy()
Interval_Substitute() is_empty() is_full()
equal() Lexicompare() Compare()
to_Hex() from_Hex() to_Bin()
from_Bin() to_Dec() from_Dec()
to_Enum() from_Enum() Bit_Off()
Bit_On() bit_flip() bit_test()
Bit_Copy() LSB() MSB()
lsb() msb() rotate_left()
rotate_right() shift_left() shift_right()
Move_Left() Move_Right() Insert()
Delete() increment() decrement()
inc() dec() add()
subtract() Negate() Absolute()
Sign() Multiply() Divide()
GCD() Power() Block_Store()
Block_Read() Word_Size() Word_Store()
Word_Read() Word_List_Store() Word_List_Read()
Word_Insert() Word_Delete() Chunk_Store()
Chunk_Read() Chunk_List_Store() Chunk_List_Read()
Index_List_Remove() Index_List_Store() Index_List_Read()
Union() Intersection() Difference()
ExclusiveOr() Complement() subset()
Norm() Min() Max()
Multiplication() Product() Closure()
Transpose()

Karl Forner

unread,
Oct 5, 2006, 6:19:30 PM10/5/06
to Leopold Toetsch, perl6-i...@perl.org
On 10/5/06, Leopold Toetsch <l...@toetsch.at> wrote:
>
> Am Donnerstag, 5. Oktober 2006 01:49 schrieb Karl Forner:
> >
> > Whare the requirements/constraints of a ResizableBooleanArray ? e.g are
> > unshift to be less frequent that shift ?
>
> shift and unshift are both more unlikely than push/pop I presume. OTOH if
> a
> user wants a bit queue, you have to deal with both ends ;)
>
> > Should I keep the actual implementation concept : allocating by chunks
> of
> > 64 bytes ?
>
> 64 bytes is any arbitray number but probably fine. It ought to be some
> power
> of 2 though and smaller than 16 bytes doesn't make sense.
>
> The basic problem with Resizable*Array is: we'd need 3 fields for
> book-keeping:
>
> first_elem 0 unless shift/unshift was done
> last_elem_p1 elements = last_elem_p1 - first_elem
> allocated amount of allocated memory
>
> The latter is of course just an optimization, but we can't afford resizing
> on
> each size change.
>
> Current PMC layout only allows for 2 integer fields: PMC_int_val and
> PMC_int_val2.


Ok. So first maybe the best way is to use an external lib (GMP or
Bit::Vector, cf previous mail in thread)...

Otherwise, I have imagined two approaches :

1) in the hypothesis that shift/unshifts are more unlikely, it can be done
in a way that uses only the 2 integer fields,
one for the nb of elements, the other for the allocated memory.
If a shift or unshift is needed, then alloc and copy.

2) in all resize ops are to be symmetrical, we can allocate a third int
field : PMC_data could point to a struct holding a buffer and its size.

ResizableBooleanArray tried to implement this strategy:
>
> 1) 'allocated' is always the minimal n*64 bytes (aka n*CHNUK_SIZE)
> 2) if there's more room than CHUNK_SIZE on either side of the array,
> reallocation and moving of bits happens to fullfill again constraint 1.
> 3) and of course, if more bits are needed, reallocation must be done.


yes that's what I had figured out.


Karl

0 new messages