[Boost-users] [multi_array] Why the precondition on assignment?

22 views
Skip to first unread message

Peter Barker

unread,
Feb 23, 2009, 6:35:58 AM2/23/09
to boost...@lists.boost.org
Hello,

I wonder if anyone can give me the rationale behind
boost::multi_index's assignment operator having the following
precondition:

std::equal(this->shape(),this->shape()+this->num_dimensions(),x.shape());

as mentioned in this document:
http://www.boost.org/doc/libs/1_38_0/libs/multi_array/doc/reference.html
?

I've got a multi_array as a data member of a class and would like to
avoid having to create an assignment operator for my class do the
resizing and assignment.

Regards,

Peter Barker
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Peter Barker

unread,
Feb 24, 2009, 8:42:52 AM2/24/09
to boost...@lists.boost.org
On Mon, Feb 23, 2009 at 11:35 AM, Peter Barker <newb...@gmail.com> wrote:
> Hello,
>
> I wonder if anyone can give me the rationale behind
> boost::multi_index's assignment operator having the following
> precondition:

That meant to say boost::multi_array like the subject does!

I can't see the logic for requiring the assigned-to object to have the
same shape as the RHS object - surely the array is being discarded and
being made a copy of RHS?

Anyone?

Regards,

Pete

Thomas Klimpel

unread,
Feb 24, 2009, 1:46:34 PM2/24/09
to boost...@lists.boost.org
Peter Barker wrote:
> > I wonder if anyone can give me the rationale behind
> > boost::multi_index's assignment operator having the following
> > precondition:
>
> That meant to say boost::multi_array like the subject does!
>
> I can't see the logic for requiring the assigned-to object to have the
> same shape as the RHS object - surely the array is being discarded and
> being made a copy of RHS?

It had something to with concepts initially.
It later turned more into a "communication issue". People seem to get very upset about the assignment operator of boost::multi_array, and write long emotional texts why it is completely wrong the way it is. But when Ronald Garcia suggested how he could change boost::multi_array, they didn't even care to give any feedback at all.

> Anyone?

Perhaps I misremember things and should have searched the archives before answering you, but the "Anyone" suggests to me that you want to get at least any reaction/feedback to your question.

Regards,
Thomas

alfC

unread,
Feb 24, 2009, 10:25:05 PM2/24/09
to boost...@lists.boost.org
On 24 Feb., 05:42, Peter Barker <newbar...@gmail.com> wrote:
> > I wonder if anyone can give me the rationale behind
> > boost::multi_index's assignment operator having the following
> > precondition:
> That meant to say boost::multi_array like the subject does!

I am not sure, but it seems that the idea was to avoid (surprise)
reallocation by all means. If the LHS has to change size that means
reallocation in the first place.

Looking at the development history it seems that the author avoided
reallocation by all means in the first versions of the library. Then
he later added the .resize method (which does reallocation) but the
assignment still follows the original prescription and that is why I
believe. I might be wrong or there might be other reasons. I would
like to know too, if that is the case.

Note that, in multidimensional arrays, reallocation and/or copying is
needed even if it shrinks in some directions (and not only at growing
as one dimensional arrays)

Regarding whether the current is the right design, I am not sure, on
one hand it makes sense for a multidimensional arrays to forbid resize
altogether, because in almost all cases, resize means reallocation and
copying which has the order of cost of copy-constructing a new
multiarray. On the other hand this restricted assignment doesn't
follow the standard (expected) semantics.

Regars,
Alfredo

Peter Barker

unread,
Feb 25, 2009, 4:52:29 AM2/25/09
to boost...@lists.boost.org
Thomas: You're right - I was hoping to get some feedback on it. It was
bugging me because I thought I missed something obvious.

Alfredo: Thanks for highlighting the conflicting design goals of
avoiding reallocation and expected semantics. My vote would definitely
be for the expected semantics because if the assignment is present in
the program, then the programmer probably knows what they're doing.
More importantly, it would avoid the maintenance burden of
implementing operator= in classes that have multi_array instances as
members.

Thanks both!

Regards,

Pete

alfC

unread,
Feb 26, 2009, 3:32:22 AM2/26/09
to boost...@lists.boost.org
> My vote would definitely
> be for the expected semantics because if the assignment is present in
> the program, then the programmer probably knows what they're doing.
> More importantly, it would avoid the maintenance burden of
> implementing operator= in classes that have multi_array instances as
> members.

I agree, and specially because operator= in particular can not be
redefined in C++ outside the class.
I would honor semantics avobe all but not without some conceptual
redesing first, (I wouldn't know WHAT to redesign though :)) , that
comes with the unavoidable cost of surprise reallocation and a
modification of current behavior of the library. It also poses some
questions, should base indices be copied too? what if base indices are
different? what if storage order is different? is the storage-order
copied?
Or suppose the opposite, that we want to forbid reallocation (and for
simplicity even reshape) then it will turn out that multi_arrays of
different shapes are effectivelly different
"dynamic" types(e.g. an array of 10x10 and an array of 20x20 are
different beasts although both are the same C++ type). I really don't
know how "effective dynamic types" can be formalized in this context
except that we could throw and exception like bad_cast or domain_error
or size_mismatch or something when using arrays of different shapes.

I hope someone goes around this problem in an elegant manner either
with or without modifying the library; but yes then, the design goals
of MultiArray should be stated more clearly in the first place. I
admit that, in the few cases I had to do something about that, I just
checked if I have to resize the destination array before assignment,
but I was never happy with it. My next idea was to encapsulate the
assigment in a function call that creates an object of class
assign_helper that has a reference to the multi_array with an
operator= that does this check and eventual resizes for me, so I can
use it like this:

smart_assign(A)=B; // takes care of eventual resize. Used instead of
A=B (which may fail), (both A and B are of type multi_array<double, 2>
for example.)

that was my way arround not being able to redefine operator=; but I am
not sure if it is a good idea yet and whether it is a real improvement
over just something like
'smart_assign(A, B);';

Regards,
Alfredo

Joel Falcou

unread,
Feb 26, 2009, 4:56:34 AM2/26/09
to boost...@lists.boost.org
alfC a écrit :

>
> smart_assign(A)=B; // takes care of eventual resize. Used instead of
> A=B (which may fail), (both A and B are of type multi_array<double, 2>
> for example.)
>
> that was my way arround not being able to redefine operator=; but I am
> not sure if it is a good idea yet and whether it is a real improvement
> over just something like
> 'smart_assign(A, B);';

Tossing my few cents as I dealt with such shenanigans a lot of time.
In my own multi-array like class, I faced this dilemma. What I did was using
template boost::parameters to specify policy on allocation in the type
and had the user
make choice between throwing, static_asserting, silently reallocating,
preserving order etc

Why not using such policies ? I know it involves drastically changing
multi_array interface but maybe it's worth the hassle.
Just let the default parameters be the old multi_array semantic.

Peter Barker

unread,
Feb 26, 2009, 5:40:51 AM2/26/09
to boost...@lists.boost.org
I've never tried to write a multi array class so perhaps I'm looking
at the problem too simplistically.

With std::vector (and probably all containers?) operator= will
potentially reallocate, so I don't understand why boost::multi_array
should be special. It's not really a *surprise* reallocation is it?

operator= implies to me that you want to forget what's currently being
held and assume a copy of the data that the other object has. I'd be
*expecting* reallocations and existing indices/iterators to be
invalidated.

I've only just started using multi_array and I know there's a lot more
to it than I'm aware of so apologies if my view on operator= is based
more on ignorance than enlightenment!

Regards,

Pete

alfC

unread,
Feb 26, 2009, 3:14:09 PM2/26/09
to boost...@lists.boost.org
> With std::vector (and probably all containers?) operator= will
> potentially reallocate, so I don't understand why boost::multi_array
> should be special. It's not really a *surprise* reallocation is it?

first of all, linked list (~std::list), ordered trees (~std::set) and
queues (~std::queue) don't have this problem at all, and they only
need to allocate the storage for the new elements (or something of
order 1); and they are designed with that in mind.

going back to std::vector: What you say is correct for a dynamic (one
dimensional) arrays but std::vector implements a more complex
machinery to avoid reallocation in most cases. Something that
multi_array doesn't do at all.
In general, as std::vector grows (e.g. on push_back), it allocates
more space that it needs. This is done automatically each time the
vector *has* to grow, for example by duplicating the allocated
(reserved) space; or it can be done manually. That is why all these
methods are defined for std::vector

reserve()
capacity()

which are different from
resize()
size()

because of this trickery, reallocation in std::vector happens much
less ofter than you may think. The price is extra storage (or
optionally more manual control of reserved space.)
Going back to your example: If assigned and assignee std::vector are
of the "same order of size" then reallocation is unlikely, or can be
amortized the first assignment.

> operator= implies to me that you want to forget what's currently being
> held and assume a copy of the data that the other object has.

Don't get me wrong, I agree with you. I am just pointing at the
inconsistency in the design of multi_array but also trying to
understand what is the origin of the problem and think of possible
ways around them.

Now that you mentionad the example of std::vector. I am wondering
whether such manual control of multi_array reserved space is THE
solution.
something like

A.reserve({{shape1,shape2,shape3}}); // or just reserve
(shape1*shape2*shape3)

can the compromise solution for everyone. This can follow elegantly
the design of std::vector at least partially. Ronald?

(I would say automatic growing is a bad idea for multi_arrays, but
manually reserving space can't hurt). Even for the solution I gave in
my previous post I would need something like reserve because resize
actually does copies and/or constructs element, and I don't need that
because the elements will be overwritten anyway on the assignment.
That would be the my first application of reserve.

Regards,
Alfredo

alfC

unread,
Feb 26, 2009, 3:21:13 PM2/26/09
to boost...@lists.boost.org
> In my own multi-array like class, I faced this dilemma. What I did was using
> template boost::parameters to specify policy on allocation in the type
> and had the user
> make choice between throwing, static_asserting, silently reallocating,
> preserving order etc

can you illustrate a little bit your design? Is the policy part of the
state of the class?
Is your multi_array class written on top of Boost.MultiArray. If not
did that require small amount of code or basically another gargantuan
sized library?
Didn't you have problems with all the other derived type like
subarrays and array_views, or you just didn't need them.

> Why not using such policies ? I know it involves drastically changing
> multi_array interface but maybe it's worth the hassle.

why the "interface changes"? (except for the a redefined operator= I
guess)

Thank you,
Alfredo

Joel Falcou

unread,
Feb 26, 2009, 3:43:04 PM2/26/09
to boost...@lists.boost.org
alfC a écrit :

> can you illustrate a little bit your design? Is the policy part of the
> state of the class?
>
Sample code for example :

// 3D matrix with no-realloc semantic and base index of [1 1 1]
matrix< float, settings(3d_, no_realloc, base_index<1,1,1>)>
m( ofSize(4,4,4) );

Policy is part of the type signature and modify how the internals
of the matrix works. The settings(...) is a shortcut to gather a large
number of parameters. This type is then introspected internally
using boost::parameters.

ofSize(a0,...,an) is a function returning a nD extent object carrying the
matrix dimensions size.

> Is your multi_array class written on top of Boost.MultiArray. If not
> did that require small amount of code or basically another gargantuan
> sized library ?
>

It is "gargantuan" as it's basically a large compilation of tools for
quickly turning Matlab code into C++ with fewest change possible.
But I think the core of this thing can be extracted.

> why the "interface changes"? (except for the a redefined operator= I
> guess

I was thinking of the changes of the type signature sorry

--
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Peter Barker

unread,
Feb 27, 2009, 8:49:12 AM2/27/09
to boost...@lists.boost.org
On Thu, Feb 26, 2009 at 8:14 PM, alfC <alfredo...@gmail.com> wrote:
> Now that you mentionad the example of std::vector. I am wondering
> whether such manual control of multi_array reserved space is THE
> solution.
> something like
>
> A.reserve({{shape1,shape2,shape3}}); // or just reserve
> (shape1*shape2*shape3)

Would this still make it necessary to write an operator= in a class
that has a multi_array as a member? That's the main thing I'd like to
avoid as it introduces a maintenance burden to ensure all the other
members are copied.

Because of that I think multi_array::operator= must be the exception
to a strategy on avoiding reallocations. If the existing space can be
used, then all well and good - have that optimisation.

Thanks for explaining a bit more about multi_array and comparison of
it with std::vector.

Regards,

Pete

alfC

unread,
Feb 27, 2009, 7:42:12 PM2/27/09
to boost...@lists.boost.org
> > something like
> > A.reserve({{shape1,shape2,shape3}}); // or just reserve
> > (shape1*shape2*shape3)
>
> Would this still make it necessary to write an operator= in a class
> that has a multi_array as a member?

Yes you will still need to write such operator, but at least it *can*
be done efficiently (i.e. without useless copying that is performed if
you resize (with A.resize()) the matrix first).

I proposed and I had been thinking for a long time in a "reserve"
method keyword for two reasons:

* First, because I think that, with the current design, it is the only
way to implement a resize-and-assign efficiently, then you can wrap
this into a new object that has your desired semantics (I propose you
to call it small_multi_array for example).

* Second, for some numerical libraries I sometimes need to allocate
extra storage beyond the end of the multi_array!! And it is not an
esoteric numerical library, it is the famous FFTW3, which for the MPI
version needs to allocate some more space than the one needed for the
multidimensional array because of the algorithmic requirements.
http://www.fftw.org/fftw3.3alpha_doc/Simple-MPI-example.html#Simple-MPI-example.
My current approach is very dirty, I have an object that has a
std::vector v which is only used to "reserve" enough space and a
multi_array_ref with proper dimensionality and shape which "points" to
&v[0]. In some situations (because of needs of FFTW3) the size (or
reserved space) of the vector is larger than the one needed to
reference all the array indexes. In this way I have total control of
the allocated memory of the multi_array (which is not a multi_array
anymore but a multi_array_ref). I have to do that just because there
is no "reserve".

As you see you are not the only one having to do dirty tricks with
multi_array. At this point you may ask, why using multi_array at all?
well, I still find very convenient other features of the library like
straightforward indexing, index bases, strides, subarrays and
arrayviews.

> That's the main thing I'd like to avoid as it introduces a maintenance burden to ensure all the other
> members are copied.

Yeamm, sorry, it doesn't solve that problem. But at least with
'reserve' it *can* be solved.

Let's see, lets try to find a solution at a higher level. Programs
that deal with arrays, for example Matlab don't have this problem on
assignment (op=) because they use copy-on-write for arrays in the
first place. Which means that nothing is reallocated or copied on
calling operator= but then when *at least one* element is modified the
reallocation and copy happens together (like with the current resize
()! ). BTW, any body know what is the underlying Fortran strategy?
Maybe we should stop thinking about "improving" the MultiArray and
taking it as given, and start thinking on adapting it nicely (with a
small layer of code on top of MultiArray) into a shared/copy-on-write
type pointer that resembles as much as possible the multi_array. It is
not that I know what to do exactly, I am trying to think out loud (and
hoping the smart Boost developers to hear).

Going back to your specific problem: If the arrays you are handling
are big, didn't you think of keeping the multi_array in a shared_ptr
(or a copy-on-write sort of thing) that is a member of your class. If
the arrays are small and you can afford reallocation and spurious
copies then wrap the multi_array in a small_multi_array class with the
expected semantics.

> Because of that I think multi_array::operator= must be the exception
> to a strategy on avoiding reallocations. If the existing space can be
> used, then all well and good - have that optimisation.

Given the current design: yes, I agree.

But also remember that MultiArray is not only about multi_array, there
are many other classes in the library where operator= still will work
with the "restricted" semantics; for example, subarrays can still be
assigned but the sizes have to match. For example

multi_array<double, 2> A(extents[5][5]);
multi_array<double, 1> B(extents[5]);
multi_array<double, 1> C(extents[4]);

...currently you can do:
A[3]=B; // A[3] is of type subarray

but you can't do:
A[3]=C; // asserts false in the same way as mismatched multi_arrays

Should we complain about that too because it is an operator= call that
fails in some cases?

Alfredo

Reply all
Reply to author
Forward
0 new messages