Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

'Pure' value types?

1 view

Skip to first unread message

Roland Pibinger

unread,

Oct 10, 2005, 6:20:30 AM10/10/05

Is there a name in use for user-defined value types that behave
essentially like built-in value types? I mean value types that meet
the following criteria:

- default constructor, copy constructor and operator= do not throw an
exception, i.e. have exception specification 'throw()';
- assignment is the only way to change (mutate) the object;

'Pure' or 'strong' value types may be an appropriate name. But
probably there exists one already?

TIA
Roland Pibinger

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

chris jefferson

unread,

Oct 10, 2005, 9:40:34 AM10/10/05

Roland Pibinger wrote:

>> Is there a name in use for user-defined value types that behave
>> essentially like built-in value types? I mean value types that meet
>> the following criteria:
>>
>> - default constructor, copy constructor and operator= do not throw an
>> exception, i.e. have exception specification 'throw()';
>> - assignment is the only way to change (mutate) the object;

Remember ++ and --

>> 'Pure' or 'strong' value types may be an appropriate name. But
>> probably there exists one already?

I've always called them "builtin-like" types, but I'm sure there is a
beter name.

kanze

unread,

Oct 10, 2005, 9:41:04 AM10/10/05

Roland Pibinger wrote:

>> Is there a name in use for user-defined value types that
>> behave essentially like built-in value types?

I just call them value types.

>> I mean value types that meet the following criteria:

>> - default constructor, copy constructor and operator= do not
>> throw an exception, i.e. have exception specification
>> 'throw()';

That's a difficult requirement. In practice, most of my value
types meet it, but only because I replace the new_handler to
abort, rather than throw, if allocation fails:-).

>> - assignment is the only way to change (mutate) the object;

I presume you also allow the <op>= operators (when appropriate).

>> 'Pure' or 'strong' value types may be an appropriate name. But
>> probably there exists one already?

Good question. You'd think that there would be, but I
consistently hear things like std::string refered to as a value
type.

How about "correctly designed value types"? :-)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Roland Pibinger

unread,

Oct 10, 2005, 4:15:52 PM10/10/05

On 10 Oct 2005 09:40:34 -0400, chris jefferson <c...@cs.york.ac.uk>
wrote:
>
>Remember ++ and --

Not necessarily, think of string types!

Best wishes,
Roland Pibinger

Nicola Musatti

unread,

Oct 11, 2005, 8:56:29 PM10/11/05

kanze wrote:
> Roland Pibinger wrote:
[...]

> >> 'Pure' or 'strong' value types may be an appropriate name. But
> >> probably there exists one already?
>
> Good question. You'd think that there would be, but I
> consistently hear things like std::string refered to as a value
> type.
>
> How about "correctly designed value types"? :-)

I guess you could say that there are "strong" value types and "string"
value types ;-)

Cheers,
Nicola Musatti

Daniel Krügler

unread,

Oct 12, 2005, 11:07:17 AM10/12/05

Roland Pibinger wrote:
> On 10 Oct 2005 09:40:34 -0400, chris jefferson <c...@cs.york.ac.uk>
> wrote:
>
>>Remember ++ and --
>
>
> Not necessarily, think of string types!
>

But which ones? std::basic_string can't provide the no-throw guarantees
you mentioned, so only static strings would be proper candidates or
something like boost::array.
Since you emphasize that increment/decrement operations need not to be
available I assume you don't restrict on scalar-like data types, right?

Considering your exception guarantees one might chose a name like
"statically-sized value types".

Greetings from Bremen,

Daniel Krügler

Greg Herlihy

unread,

Oct 12, 2005, 11:09:07 AM10/12/05

Roland Pibinger wrote:
> Is there a name in use for user-defined value types that behave
> essentially like built-in value types? I mean value types that meet
> the following criteria:
>
> - default constructor, copy constructor and operator= do not throw an
> exception, i.e. have exception specification 'throw()';
> - assignment is the only way to change (mutate) the object;
>
> 'Pure' or 'strong' value types may be an appropriate name. But
> probably there exists one already?
>
> TIA
> Roland Pibinger

I have seen the term "strong typedef" used to describe a user-defined
type that is functionally equivalent to another, existing type. The
term "strong" serves to distinguish the type from an ordinary typedef
which does not define a new type but simply aliases an existing one.
Unfortunately, being an alias for another type, typedefs do not provide
the degree of type safety that a diferentiated type would provide. In
fact there is a proposal to add support for strong typedefs to C++. I
think that the language would benefit from having such a feature.

In the meantime for anyone interested in the idea, I would suggest
looking at Boost's implementation of the strong typedef (in
strong_typedef.hpp) for one way to create a strong typedef using
existing C++ language facilities.

Greg

Roland Pibinger

unread,

Oct 12, 2005, 11:15:22 AM10/12/05

On 10 Oct 2005 09:41:04 -0400, "kanze" <ka...@gabi-soft.fr> wrote:

>Roland Pibinger wrote:
>>> - default constructor, copy constructor and operator= do not
>>> throw an exception, i.e. have exception specification
>>> 'throw()';
>
>That's a difficult requirement. In practice, most of my value
>types meet it, but only because I replace the new_handler to
>abort, rather than throw, if allocation fails:-).

The above requirements should hold. Other constructors (non-default
and non-copy constructors) may throw because they need to validate
input and report failure with an exception.
The idea is to define basic 'bricks' with strong constraints. Larger
objects (value and entity types) made of these bricks automatically
'inherit' (in a non-OO-sense) their properties.

Best regards,
Roland Pibinger

Mirek Fidler

unread,

Oct 12, 2005, 12:38:20 PM10/12/05

Well, I believe than in some languages they are called "immutable", but
I might be wrong.

Also, as James correctly pointed out, built-in value types in C++ are
not strictly "immutable", because there are operators like '+=' or '++'
which can be considered mutating methods.

Mirek

Tony Delroy

unread,

Oct 12, 2005, 12:36:55 PM10/12/05

Roland wrote:
> Is there a name in use for user-defined value types that behave
> essentially like built-in value types? I mean value types that meet
> the following criteria:
>
> - default constructor, copy constructor and operator= do not throw an
> exception, i.e. have exception specification 'throw()';
> - assignment is the only way to change (mutate) the object;

I think you're specifying a weird mix of criteria here - perhaps you
have a vague instinctive notion about some kind of "basic" type and
think others will access similar notions they have, despite your actual
description.

Terminology I've encountered that relates vaguely to what you mention
includes value-semantic types, vocabulary types, and POD types.

> - default constructor, copy constructor and operator= do not throw an
> exception, i.e. have exception specification 'throw()';

This suggests POD types. Robust operation in the presence of dynamic
memory usage requires either throwing or placing the object in an error
state to be tested later. The latter approach can be elegant, but it
is more subtle and users of such a class often get it wrong.

> - assignment is the only way to change (mutate) the object;

This is an unusual requirement. It suggests that the type is
notionally "atomic" (as you won't provide operations to set particular
fields), OR something bizarre like a type-map where the field to set is
implicit in the type being assigned from.

This criteria doesn't match the types discussed elsewhere in this
thread: for example, strings can be mutated by non-assignment member
functions.

Generally, I suggest you read about (user-defined) "value-semantic"
types. Informally, my impression of typical features:

- observed behaviour for copy-construction and assignment is as per a
new instance with the value of the right-hand-side/source/copied object

- operators work in an intuitive way, matching analogous behaviours for
whatever int/real/string types they resemble

- sensible automatic memory management if necessary (for things like
variable-sized character buffer for string)

- there's a sense that they encapsulate some small unit of data, rather
than provide high-level procedural services

- typically types that you might reasonable imagine a container of, as
distinct from singletons.

- there's no requirement for them not to throw though - indeed, robust
but concise behaviour of copy-constructors and operators is a core
motivation for exceptions

- examples include dates, where mutating member functions may include
things like set_to_first_day_of_month(), set_to_next(TUESDAY) etc..

Some containers take ownership of the values they store, and have
value-semantic behaviours for copy-construction and assignment. I
don't know whether they are formally considered value-semantic types,
but suspect not.

Regards,

Tony

Allan W

unread,

Oct 12, 2005, 6:45:17 PM10/12/05

Roland Pibinger wrote:
>>> Is there a name in use for user-defined value types
>>> that behave essentially like built-in value types?
>>> I mean value types that meet the following criteria:
>>>
>>> - default constructor, copy constructor and operator=
>>> do not throw an exception, i.e. have exception
>>> specification 'throw()';
>>> - assignment is the only way to change (mutate) the object;

chris jefferson wrote:
>> Remember ++ and --

Roland Pibinger replied:

> Not necessarily, think of string types!

I don't get what you're saying.

Chris was surely pointing out that you can mutate int and long
with ++ and --, which is not normally considered assignment.
What you're saying about std::string doesn't seem to make sense
in this connection.

Then again, std::string doesn't seem to match your criteria
either, because the constructor can throw.

I'm sure that I'm missing some subtle meaning here, because
at first glance it looks to me like NONE of the standard types
(either built-in or library) obey your criteria.

>>> 'Pure' or 'strong' value types may be an appropriate name. But
>>> probably there exists one already?

I'd ask what you're trying to accomplish.. but it looks like
you're looking for terminology, not techniques.

>>> I've always called them "builtin-like" types, but I'm sure
>>> there is a beter name.

Presumably meant to parallel the name "builtin" types, which
refers to types that can be used without having to #include
anything at all...?

Roland Pibinger

unread,

Oct 13, 2005, 9:16:39 AM10/13/05

On 12 Oct 2005 12:38:20 -0400, Mirek Fidler <c...@volny.cz> wrote:
>
>Well, I believe than in some languages they are called "immutable", but
>I might be wrong.
>
>Also, as James correctly pointed out, built-in value types in C++ are
>not strictly "immutable", because there are operators like '+=' or '++'
>which can be considered mutating methods.

I've also had 'immutable objects' in mind. But, as you point out, an
object with operator= cannot be called 'immutable'. In C#, Java, ...
in order to be usable immutable objects need mutable references. So,
conceptually, there is no big difference between 'immutable' value
objects in C#, Java, ... and [to be filled] value objects in C++.

Best wishes,
Roland Pibinger

kanze

unread,

Oct 13, 2005, 9:34:24 AM10/13/05

Roland Pibinger wrote:
> On 10 Oct 2005 09:41:04 -0400, "kanze" <ka...@gabi-soft.fr> wrote:
> >Roland Pibinger wrote:
> >>> - default constructor, copy constructor and operator= do
> >>> not throw an exception, i.e. have exception specification
> >>> 'throw()';

> >That's a difficult requirement. In practice, most of my
> >value types meet it, but only because I replace the
> >new_handler to abort, rather than throw, if allocation
> >fails:-).

> The above requirements should hold. Other constructors
> (non-default and non-copy constructors) may throw because they
> need to validate input and report failure with an exception.
> The idea is to define basic 'bricks' with strong
> constraints. Larger objects (value and entity types) made of
> these bricks automatically 'inherit' (in a non-OO-sense) their
> properties.

I understand the interest of such types. I'm just pointing out
that if the class needs resources, and acquiring those resources
may fail, you may have problems with the copy constructor, at
least if deep copy is implemented. The most ubiquious resource
is dynamic memory, and I can imagine a lot of value-oriented
classes which use dynamic memory.

Of course, if the only modification of the object is through
assignment, then reference counting can be used for copy. (In
this case, I think we could extend "assignment" to include any
operation whose semantics are defined in terms of total
assignment. The thing which breaks reference counting is
partial modification.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

Oct 13, 2005, 9:33:02 AM10/13/05

Mirek Fidler wrote:
> Roland Pibinger wrote:
> > Is there a name in use for user-defined value types that
> > behave essentially like built-in value types? I mean value
> > types that meet the following criteria:

> > - default constructor, copy constructor and operator= do not
> > throw an exception, i.e. have exception specification
> > 'throw()';
> > - assignment is the only way to change (mutate) the object;

> > 'Pure' or 'strong' value types may be an appropriate
> > name. But probably there exists one already?

> Well, I believe than in some languages they are called
> "immutable", but I might be wrong.

> Also, as James correctly pointed out, built-in value types in
> C++ are not strictly "immutable", because there are operators
> like '+=' or '++' which can be considered mutating methods.

Yes and no. The formal definition of a += b is that it is the
same as a = a + b, except that (the lvalue of) a is only
evaluated once. And the formal definition of ++ a is a += 1.
So semantically, these are all the equivalent of an assignment.

Of course, there is also the problem that they may throw.
Overflow is undefined behavior, and one possible actual behavior
would be to throw an exception. But with regards to the formal
definition, the exception is raised during the calculation of
the value to be assigned.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

Oct 13, 2005, 9:31:48 AM10/13/05

Tony Delroy wrote:
> Roland wrote:
> > Is there a name in use for user-defined value types that
> > behave essentially like built-in value types? I mean value
> > types that meet the following criteria:

> I think you're specifying a weird mix of criteria here -

Not really. I can see a lot of advantage of a type giving these
guarantees. With the possible extension that there might be
additional mutating operators (like +=), provided that these
operators are all defined in terms of assignment.

> perhaps you have a vague instinctive notion about some kind of
> "basic" type and think others will access similar notions they
> have, despite your actual description.

> Terminology I've encountered that relates vaguely to what you
> mention includes value-semantic types, vocabulary types, and
> POD types.

They're definitely not just POD types. Think of things like
complex.

> > - default constructor, copy constructor and operator= do not
> > throw an exception, i.e. have exception specification
> > 'throw()';

> This suggests POD types.

Not at all.

> Robust operation in the presence of dynamic memory usage
> requires either throwing or placing the object in an error
> state to be tested later.

It depends on the application and the object. In most of my
applications, I replace the new_handler to abort, so I can never
see a bad_alloc -- it makes life a lot simpler, and in practice,
the only way I'm going to run out of memory is if I leak. (It's
worth pointing out, of course, that I mostly develop server
software which runs on dedicated servers. There aren't othe
applications on the system which might cause me to run out of
memory.)

Also, it's possible to design a class like string so that the
default constructor and the assignment operator don't allocate,
although other constructors might.

> The latter approach can be elegant, but it is more subtle and
> users of such a class often get it wrong.

I don't think he was recommending the latter approach. The
latter approach is valid for object types which may become
invalid after having been validly constructed -- the user has to
systematically check the state anyway, so having to do so after
construction is no additional burden. But that's about it.
(There are a few other exceptions, and I've seen this technique
used in transactional systems.)

> > - assignment is the only way to change (mutate) the object;

> This is an unusual requirement. It suggests that the type is
> notionally "atomic" (as you won't provide operations to set
> particular fields), OR something bizarre like a type-map where
> the field to set is implicit in the type being assigned from.

He didn't say anything about assignment being atomic.

The requirement does mean that the user cannot modify individual
parts of the object. This is generally the case for non
collection value objects, however, or should be.

> This criteria doesn't match the types discussed elsewhere in
> this thread: for example, strings can be mutated by
> non-assignment member functions.

std::string can be. I've never heard anyone accusing
std::string of being particularly clean design, however.
Cleanly designed string classes (like my pre-standard String, or
java.lang.String) can't be modified other than by assignment,
however.

This is, of course, supposing that the abstraction behind string
is a text string -- if the abstraction is just a collection of
characters, other rules hold.

> Generally, I suggest you read about (user-defined)
> "value-semantic" types. Informally, my impression of typical
> features:

> - observed behaviour for copy-construction and assignment is
> as per a new instance with the value of the
> right-hand-side/source/copied object

> - operators work in an intuitive way, matching analogous
> behaviours for whatever int/real/string types they resemble

> - sensible automatic memory management if necessary (for
> things like variable-sized character buffer for string)

> - there's a sense that they encapsulate some small unit of
> data, rather than provide high-level procedural services

> - typically types that you might reasonable imagine a
> container of, as distinct from singletons.

> - there's no requirement for them not to throw though -
> indeed, robust but concise behaviour of copy-constructors and
> operators is a core motivation for exceptions

Not necessarily. There is a very great advantage in being able
to construct types without risk of throwing.

> - examples include dates, where mutating member functions may
> include things like set_to_first_day_of_month(),
> set_to_next(TUESDAY) etc..

I would be very sceptical of a design in which the date class
had such mutating functions. Far better would be functions
(probably non-member) which would create a new date. In fact,
IMHO, date would be a very good example of the type of class he
is looking for.

> Some containers take ownership of the values they store, and
> have value-semantic behaviours for copy-construction and
> assignment. I don't know whether they are formally considered
> value-semantic types, but suspect not.

Containers are a special case. Their only semantic is to
contain other objects, so of course, these other objects are
exposed to the user.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Roland Pibinger

unread,

Oct 14, 2005, 6:40:15 AM10/14/05

On 12 Oct 2005 18:45:17 -0400, "Allan W" <all...@my-dejanews.com>
wrote:

>chris jefferson wrote:
>>> Remember ++ and --
>
>Roland Pibinger replied:
>> Not necessarily, think of string types!
>
>I don't get what you're saying.
>Chris was surely pointing out that you can mutate int and long
>with ++ and --, which is not normally considered assignment.

i++ is (more or less) a shortcut for i = i + 1, operator= returns
(constructs) a new object. So ++, --, +=, ... can be subsumed under
'assignment'.

>What you're saying about std::string doesn't seem to make sense
>in this connection.

not for std::string but arguably for other string classes

>Then again, std::string doesn't seem to match your criteria
>either, because the constructor can throw.

constructors other than the default and the copy constructor may throw

>I'm sure that I'm missing some subtle meaning here, because
>at first glance it looks to me like NONE of the standard types
>(either built-in or library) obey your criteria.

if you take ++ and -- as assignment then it makes sense

>>>> 'Pure' or 'strong' value types may be an appropriate name. But
>>>> probably there exists one already?
>
>I'd ask what you're trying to accomplish.. but it looks like
>you're looking for terminology, not techniques.

Transfer the idea of 'immutable objects' from other languages to C++
but without requiring references (pointers) to the (immutable)
objects. Then you get 'assignable but otherwise immutable objects'
which are similar to built-in types (differences remain, e.g.
immutable objects are 'automatically' thread safe in Java).

Best wishes,
Roland Pibinger

Roland Pibinger

unread,

Oct 14, 2005, 7:19:40 AM10/14/05

On 13 Oct 2005 09:34:24 -0400, "kanze" <ka...@gabi-soft.fr> wrote:

>I understand the interest of such types. I'm just pointing out
>that if the class needs resources, and acquiring those resources
>may fail, you may have problems with the copy constructor, at
>least if deep copy is implemented. The most ubiquious resource
>is dynamic memory, and I can imagine a lot of value-oriented
>classes which use dynamic memory.

But ...

>Of course, if the only modification of the object is through
>assignment, then reference counting can be used for copy.

Yes, exactly!

>(In this case, I think we could extend "assignment" to include
>any operation whose semantics are defined in terms of total
>assignment. The thing which breaks reference counting is
>partial modification.)

IMO, it makes sense to define many (probably not all) value types that
way, e.g. Color, Date, Point, ..., String, Handle. In order to
communicate the idiom it needs a catchy name (which cannot be
'immutable' object, unfortunately; immutable objects are one of Java's
assets).

Best wishes,
Roland Pibinger

kanze

unread,

Oct 14, 2005, 1:28:13 PM10/14/05

Roland Pibinger wrote:
> On 12 Oct 2005 12:38:20 -0400, Mirek Fidler <c...@volny.cz> wrote:

> >Well, I believe than in some languages they are called
> >"immutable", but I might be wrong.

> >Also, as James correctly pointed out, built-in value types in
> >C++ are not strictly "immutable", because there are operators
> >like '+=' or '++' which can be considered mutating methods.

> I've also had 'immutable objects' in mind. But, as you point
> out, an object with operator= cannot be called 'immutable'. In
> C#, Java, ... in order to be usable immutable objects need
> mutable references. So, conceptually, there is no big
> difference between 'immutable' value objects in C#, Java, ...
> and [to be filled] value objects in C++.

I, too, tend to think of them as "immutable" objects, even
though they support assignment. And I very much agree with your
feeling that we need a name for them.

With regards to assignment, C++ and Java are different
languages, with different idioms. In C++, we not only have
assignment (operator =), but all of the <op>= assignments. In
the end, I don't see even these as violating the rules, however,
as long as the semantics are defined in terms of assignment. My
pre-standard String class, for example, supported +=, with the
semantics of lhs += rhs defined as lhs = lhs.append( rhs ).
Java allows lhs = lsh.concat( rhs ) (supposing lhs and rhs are
String). The use of += is, of course, really just a notational
convenience -- a notational convenience which is very much a
typical C++ idiom, I think.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Daniel Krügler

unread,

Oct 14, 2005, 1:39:04 PM10/14/05

Roland Pibinger wrote:
> Transfer the idea of 'immutable objects' from other languages to C++
> but without requiring references (pointers) to the (immutable)
> objects. Then you get 'assignable but otherwise immutable objects'
> which are similar to built-in types (differences remain, e.g.
> immutable objects are 'automatically' thread safe in Java).

I think, one example of a non-trivial component of the standard C++
library which fulfills these requirements is std::locale. It has exactly
the semantic you mention. What about the name "snapshot type"? ;-)

Greetings from Bremen,

Daniel

Roland Pibinger

unread,

Oct 15, 2005, 9:15:01 AM10/15/05

On 14 Oct 2005 13:39:04 -0400, =?ISO-8859-1?Q?Daniel_Kr=FCgler?=
<d...@bdal.de> wrote:

>I think, one example of a non-trivial component of the standard C++
>library which fulfills these requirements is std::locale. It has exactly
>the semantic you mention. What about the name "snapshot type"? ;-)

Hmm, I see what you mean by 'snapshot type' (actually, I've had to
meditate a little on 'snapshot' ...). Maybe 'discrete' value type?
OTOH, 'pure' value type also isn't bad from a marketing point of view
(marketing is important; e.g. the concept behind 'RAII' severely
suffered from the bad name).

Best wishes,
Roland Pibinger

Roland Pibinger

unread,

Oct 15, 2005, 9:13:59 AM10/15/05

On 14 Oct 2005 13:28:13 -0400, "kanze" <ka...@gabi-soft.fr> wrote:

>With regards to assignment, C++ and Java are different
>languages, with different idioms. In C++, we not only have
>assignment (operator =), but all of the <op>= assignments. In
>the end, I don't see even these as violating the rules, however,
>as long as the semantics are defined in terms of assignment. My
>pre-standard String class, for example, supported +=, with the
>semantics of lhs += rhs defined as lhs = lhs.append( rhs ).
>Java allows lhs = lsh.concat( rhs ) (supposing lhs and rhs are
>String). The use of += is, of course, really just a notational
>convenience -- a notational convenience which is very much a
>typical C++ idiom, I think.

I'd prefer not to have operator += (<op>=) because it hides the
(perhaps expensive) creation of a new object. In Java, String.concat()
creates a new String object (String is immutable) which must be
assigned to a reference, whereas += looks like a 'normal' mutable
method. Maybe, 'original = original + n' is a little less convenient
than 'original += n' but more expressive (i.e. indicative what's
really going on). I currently develop a ref-counted string class (a
'pure'/'strong' value type) without even operator+ which concats only
with constructors, e.g.
mystring s;
// ...
s = mystring ("Hello", " world", "!");

Best wishes,
Roland Pibinger

Tony Delroy

unread,

Oct 15, 2005, 9:23:48 AM10/15/05

Hi James,

> > I think you're specifying a weird mix of criteria here -
> Not really. I can see a lot of advantage of a type giving these
> guarantees. With the possible extension that there might be
> additional mutating operators (like +=), provided that these
> operators are all defined in terms of assignment.

Care to share some of these advantages?

> They're definitely not just POD types. Think of things like
> complex.

Agreed, but your comment ignores my qualifier "relates vaguely to what
you mention", and I explain later how POD types often fit better with
no-exception guarantees.

> > Robust operation in the presence of dynamic memory usage
> > requires either throwing or placing the object in an error
> > state to be tested later.

> It depends on the application and the object. In most of my
> applications, I replace the new_handler to abort, so I can never
> see a bad_alloc -- it makes life a lot simpler, and in practice,
> the only way I'm going to run out of memory is if I leak. (It's
> worth pointing out, of course, that I mostly develop server
> software which runs on dedicated servers. There aren't othe
> applications on the system which might cause me to run out of
> memory.)

"It depends" suggests your example is meant to show a case where robust
operation doesn't require exceptions or intra-class error state, but
you simply mention a case of non-robust operation. Robustness involves
continuing despite errors.

> > The <error state> approach can be elegant, but it is

> > more subtle and users of such a class often get it wrong.
> I don't think he was recommending the latter approach. The
> latter approach is valid for object types which may become
> invalid after having been validly constructed -- the user has to
> systematically check the state anyway, so having to do so after
> construction is no additional burden. But that's about it.
> (There are a few other exceptions, and I've seen this technique
> used in transactional systems.)

I never said Roland recommended this technique. I'm just enumerating
the two defensive-programming approaches to errors in constructors and
operators. The "latter" approach - error state - does not require
valid construction. An example of this philosophy can be seen in the
not-a-number (NaN) state of real numbers, where further operations
silently propagate the state, and a test can be done at the end of a
calculation.

> The requirement does mean that the user cannot modify individual
> parts of the object. This is generally the case for non
> collection value objects, however, or should be.

This is what I meant by being notionally "atomic". (I'm wasn't talking
about atomicity in memory, as should have been clear from the context
and wording).

> > this thread: for example, strings can be mutated by
> > non-assignment member functions.

> std::string can be. I've never heard anyone accusing
> std::string of being particularly clean design, however.
> Cleanly designed string classes (like my pre-standard String, or
> java.lang.String) can't be modified other than by assignment,
> however.

Yeah, well it's almost a sanctioned sport for C++ programmers to whinge
about the STL. Streams especially. I'm rather fond of std::string.
Anyway, there are clear efficiency reasons for having mutating strings
(extremely fast truncation, erasing and replacing without needing a
second buffer etc).

> This is, of course, supposing that the abstraction behind string
> is a text string -- if the abstraction is just a collection of
> characters, other rules hold.

You've lost me... can you elaborate please?

> > robust but concise behaviour of copy-constructors and
> > operators is a core motivation for exceptions

> Not necessarily. There is a very great advantage in being able
> to construct types without risk of throwing.

As with many aspects of the language, alternatives exist when designing
constructors, operators, user-defined-types, and the necessary error
handling. Clearly there are great advantages to non-throwing
constructors, but there are often unacceptable compromises, such as
aborting, adopting DbC-style contracts, ignoring errors, or risking
that the caller won't check for error state at appropriate times.

Please remember that I didn't argue that exceptions should be used
where possible, I simply - and carefully - said "robust but concise

behaviour of copy-constructors and operators is a core motivation for

exceptions".

> I would be very sceptical of a design in which the date class

> had [set_to_first_day_of_month() etc] mutating functions.

> Far better would be functions
> (probably non-member) which would create a new date.

Why? I think it's natural to provide both. There can be performance
implications. For example: a date for which the storage policy stored
broken-down day/month/year values can implement a
set_to_first_day_of_month() with a simple day_ = 1 (if it doesn't reach
back to some of the quirkier parts of the Gregorian calendar). Why
copy the entire object if it's just going to be assigned back to
itself? This is particularly relevant if the implementations are
out-of-line. For core types like dates and strings intended for
across-the-board reuse it's inappropriate to make
performance-compromising design decisions.

> In fact, IMHO, date would be a very good example
> of the type of class he is looking for.

So you think my post mentioned something relevant and correct? Maybe
there's hope for me yet....

> Containers are a special case. Their only semantic is to
> contain other objects, so of course, these other objects are
> exposed to the user.

Their only semantic? Container implementations often provide operator=
with value-semantic convenience and style. Operators +, -, +=, -= are
sometimes seen with reasonable implementations appending sequences or
combining sets. std::string is again an obvious example, though a bit
cheeky as it's clearly both a container and a value-semantic type.

Cheers,

Tony

James Kanze

unread,

Oct 16, 2005, 5:50:51 AM10/16/05

Roland Pibinger wrote:
> On 13 Oct 2005 09:34:24 -0400, "kanze" <ka...@gabi-soft.fr>
> wrote:

>>I understand the interest of such types. I'm just pointing
>>out that if the class needs resources, and acquiring those
>>resources may fail, you may have problems with the copy
>>constructor, at least if deep copy is implemented. The most
>>ubiquious resource is dynamic memory, and I can imagine a lot
>>of value-oriented classes which use dynamic memory.

> But ...

>>Of course, if the only modification of the object is through
>>assignment, then reference counting can be used for copy.

> Yes, exactly!

Still, you don't necessarily want to impose reference counting,
at least not in a multithreaded environment.

Another alternative would be to use garbage collection, a
non-counted shared implementation, and *always* copy on write.
(The classic copy on write checks whether the impplementation is
shared, on only copies if it is.)

>>(In this case, I think we could extend "assignment" to include
>>any operation whose semantics are defined in terms of total
>>assignment. The thing which breaks reference counting is
>>partial modification.)

> IMO, it makes sense to define many (probably not all) value
> types that way, e.g. Color, Date, Point, ..., String,
> Handle.

I totally agree.

Obviously, collections are not "value" types, at least not of
this sort. And I think some people think of strings as a
collection of characters. (I would disagree, but that's another
question.)

> In order to communicate the idiom it needs a catchy name
> (which cannot be 'immutable' object, unfortunately; immutable
> objects are one of Java's assets).

It may not have a name, but it's as easy to do in C++ as in
Java. (And I would have said that one of Java's weaknesses is
the lack of any formal value objects. The results of modifying
a java.awt.Dimension returned from a Swing component can be
rather strange.)

--
James Kanze mailto: james...@free.fr

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

kanze

unread,

Oct 17, 2005, 9:55:50 AM10/17/05

Tony Delroy wrote:
> Hi James,

> > > I think you're specifying a weird mix of criteria here -
> > Not really. I can see a lot of advantage of a type giving
> > these guarantees. With the possible extension that there
> > might be additional mutating operators (like +=), provided
> > that these operators are all defined in terms of assignment.

> Care to share some of these advantages?

Well, the advantages of being able to construct or copy an
object without risking exceptions seem obvious. As has been
often pointed out here, true exception safety is only possible
if you dispose of a minimum of operations which cannot throw.
(Having a copy which cannot throw is particularly useful for
backing out of transactions.)

Requiring an object to be non-mutable except for assignment has
the distinct advantage (for the implementor) of allowing
reference counting. For the user, the main advantage is simply
readability.

> > They're definitely not just POD types. Think of things like
> > complex.

> Agreed, but your comment ignores my qualifier "relates vaguely
> to what you mention", and I explain later how POD types often
> fit better with no-exception guarantees.

But I don't see any relationship.

> > > Robust operation in the presence of dynamic memory usage
> > > requires either throwing or placing the object in an error
> > > state to be tested later.

> > It depends on the application and the object. In most of my
> > applications, I replace the new_handler to abort, so I can
> > never see a bad_alloc -- it makes life a lot simpler, and in
> > practice, the only way I'm going to run out of memory is if
> > I leak. (It's worth pointing out, of course, that I mostly
> > develop server software which runs on dedicated servers.
> > There aren't othe applications on the system which might
> > cause me to run out of memory.)

> "It depends" suggests your example is meant to show a case
> where robust operation doesn't require exceptions or
> intra-class error state, but you simply mention a case of
> non-robust operation. Robustness involves continuing despite
> errors.

Robustness involves continuing despite recoverable errors. If
the error is indicative of a programming error, or an
abstraction failure (int overflow, etc.), continuing will
generally lead to further errors. Perhaps undetected, or
perhaps more serious.

Robustness basically means meeting the program specifications,
regardless of external events. If the program specifications
say that the program aborts on an out of memory condition, then
that is the only robust behavior for out of memory. If they say
something else, then of course, you have to do that something
else. If they don't say anything... I'd argue that a program
which is not fully specified is by definition not robust; if the
customer doesn't care, then it's up to the programmer to specify
something. But unless the customer has specified exactly what
the program should do, I don't see how a programmer can specify
much other than aborting. (I know, there are exceptions:-).
I've programmed some of them myself.)

Note too that "out of memory" means stack overflow as well; if
the specifications say to continue on out of memory, then you'd
better do something to catch stack overflow as well. (There is,
of course, no portable solution for this. But in the past, I've
written code for Sun OS which handles it; I suspect that my
solution will work with most Unices.)

[...]

> > The requirement does mean that the user cannot modify
> > individual parts of the object. This is generally the case
> > for non collection value objects, however, or should be.

> This is what I meant by being notionally "atomic". (I'm
> wasn't talking about atomicity in memory, as should have been
> clear from the context and wording).

I'm still not sure what you really mean. "Atomic" has a very
definite meaning when applied to modifying an object; I've
worked on machines where even assigning to an int wasn't atomic.

> > > this thread: for example, strings can be mutated by
> > > non-assignment member functions.

> > std::string can be. I've never heard anyone accusing
> > std::string of being particularly clean design, however.
> > Cleanly designed string classes (like my pre-standard
> > String, or java.lang.String) can't be modified other than by
> > assignment, however.

> Yeah, well it's almost a sanctioned sport for C++ programmers
> to whinge about the STL.

std::string is a case on its own. For the STL or iostream,
you'll generally find programmers at both extremes: they love
it, or they hate it. (There are probably some in the middle,
but they seem less common.) With regards to std::string, on the
other hand, I've never heard anyone say anything nice about it.

> Streams especially. I'm rather fond of std::string. Anyway,
> there are clear efficiency reasons for having mutating strings
> (extremely fast truncation, erasing and replacing without
> needing a second buffer etc).

I'm not convinced. Mutability causes no end of problems for an
efficient implementation of copy. Truncation, or extraction of
substrings in general, can also be made just as fast, if not
faster, if mutability isn't supported. And how often are things
like modifications in the middle of a string more important than
copy and extraction of a substring. (In the end, there are only
two operations you can do with a string: extract a substring,
and replace a (possibly empty) substring with another (possibly
empty) string. The first can be implemented faster if the
string doesn't support mutation.)

> > This is, of course, supposing that the abstraction behind
> > string is a text string -- if the abstraction is just a
> > collection of characters, other rules hold.

> You've lost me... can you elaborate please?

The question is simply whether String should be a collection of
characters, and fit into the STL, or whether it is a
"fundamental" type, a bit of text. The set of operations you
provide are radically different according to which abstraction
you provide. And trying to provide both just results in a
confused mess.

[...]

> > I would be very sceptical of a design in which the date
> > class had [set_to_first_day_of_month() etc] mutating
> > functions. Far better would be functions (probably
> > non-member) which would create a new date.

> Why?

Because I like readable code. I don't like confusion.

> I think it's natural to provide both.

Confusion.

> There can be performance implications.

Fine. When the profiler says you have to do something about it,
you do so. I've yet to see a program where date manipulations
were the bottleneck.

> > Containers are a special case. Their only semantic is to
> > contain other objects, so of course, these other objects are
> > exposed to the user.

> Their only semantic? Container implementations often provide
> operator= with value-semantic convenience and style.

Fine. The fact remains that the semantics are those of a
collection; the individual objects which comprise the collection
are not hidden from the user, and in fact, all of the containers
semantics turns around those objects.

There is a big difference between double[2] and complex.

> Operators +, -, +=, -= are sometimes seen with reasonable
> implementations appending sequences or combining sets.

I sort of disagree. For sequences in general, I strongly
disagree. In the case of set, I suspect that an acceptance of
operators like | or |= is based on the fact that there are two
abstractions involved. A set is a mathematical abstraction,
defined by a membership relationship. The "obvious"
implementation of the mathematical concept would be to map the
domain set to a set of integers, and use a bit map. After all,
mathematically speaking, the only information relevant to a set
is whether an element is a member or not -- which can be
represented on a bit. But this implementation poses some
obvious problems if the domain set is open, e.g. the set of all
complex numbers; you end up with a representation which won't
fit in memory, even though the set will only contain a couple of
elements.

The result is a compromise: we implement sets (at least in some
cases) as containing the actual elements, either by pointing to
it or by containing a copy (if the elements have pure value
semantics). And we often use sets as if they were a container.
So we have two abstractions at work.

> std::string is again an obvious example, though a bit cheeky
> as it's clearly both a container and a value-semantic type.

I have no problem with containers supporting certain aspects of
value semantics. A text string, however, is NOT a container, at
least not in the usual sense, in that it's contents have a
semantic which is more than just a collection of individual
elements.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Tony Delroy

unread,

Oct 20, 2005, 1:23:42 PM10/20/05

Hi James,

> Requiring an object to be non-mutable except for
> assignment has the distinct advantage (for the
> implementor) of allowing reference counting.

> Mutability causes no end of problems for an

> efficient implementation of copy. Truncation, or extraction of
> substrings in general, can also be made just as fast, if not

> faster, if mutability isn't supported. (In the end, there are only

> two operations you can do with a string: extract a substring,
> and replace a (possibly empty) substring with another (possibly
> empty) string. The first can be implemented faster if the
> string doesn't support mutation.)

Am I missing something here James? From our discussion to date, I can
only assume you're discussing reference counted implementations of
strings with at least similar behaviour to std::string. A mutating
operation on an object with reference count "1" modifies in-place.
Unmoved data need not be touched. For higher reference counts, whether
mutating functions exist or not, a non-mutating application gets a new
buffer and has to copy all desired data to there, then modify both
reference counts. How can a non-mutating object ever be faster? The
only situation I can envisage is for multi-threaded applications with
an internally thread-safe string implementation, which is very
different from using mutexes around operations on an STL-like string.
This goes back to the old question of why STL containers aren't
inherently thread-safe, and all the old reasons remain valid.

Returning to the case where you acknowledge mutators are faster:

> And how often are things
> like modifications in the middle of a string more important than
> copy and extraction of a substring.

This is one of those old questions - someone posted recently with some
statistics from their particular system. I wasn't interested enough to
read the details. Still, unless you explain how mutating functions
slow copy, truncation and extraction, I'll maintain that they don't
have to be _more_ important. The discussion thread talks about the
applicability of these practices to a variety of types, not strings
specfically.

Perhaps a better question for you is: are reference counts 1 often
enough for it to be very useful to do in-place mutation?

> For the user, the main advantage [of non-mutable types] is simply readability.

I've already expressed my personally opinion that a notation like:

my_date.set_to_first_date_in_month();

is clearer than

my_date = my_date.get_first_date_in_month();

Sure the "=" has worth as a visual indication of changing value, but
the redundant and separated use of "my_date" compromises the sense that
the object is mutating, rather than being assigned a value independent
of it's current value.

Some bizarre notations are also possible, such as:

my_date |= FIRST_DAY_OF_MONTH;

which is easy enough to read, conveys assignment and
in-place-modification, but may be less obviously available when
browsing the date header, and doesn't benefit from editors providing
drop-down member function lists.

Re robustness, batch systems can generally afford the kind of "one
error, abort, fix and rerun" approach you speak of. Many other systems
need to log and discard all ill affects of failed transactions and
attempt future transactions with a clean slate. This is quite typical
of server systems - everything from operating systems to web servers
and financial pricing systems. It's obviously more practical to be
robust in this sense if the class of errors relates to bad run-time
inputs, rather than memory exhaustion etc.. I accept what you say that
it may be better to abort than continue after errors that are
impractical to recover from.

Relating this back to my original observation, let me add the qualifier
you crave, and state that if a "value-semantic" type decides not to
throw, then to be usable in a software systems that must continue
despite the errors encountered, is should set an error state that can
be tested later.

> > > This is, of course, supposing that the abstraction behind
> > > string is a text string -- if the abstraction is just a
> > > collection of characters, other rules hold.
> > You've lost me... can you elaborate please?
> The question is simply whether String should be a collection of
> characters, and fit into the STL, or whether it is a
> "fundamental" type, a bit of text. The set of operations you
> provide are radically different according to which abstraction
> you provide. And trying to provide both just results in a
> confused mess.

I'll take your word for it, never having heard of anyone who wanted
such a "fundamental" text type, and being unable to think of a
situation where it would have been better suited than the char[],
vector<char> and STL string-style alternatives.
I've even seen strings designed for interoperability with fixed-length
FORTRAN strings, but they all provided some form of access to
individual characters, so don't match your "fundamental" notion.

> > There can be performance implications.
> Fine. When the profiler says you have to do something about it,
> you do so. I've yet to see a program where date manipulations
> were the bottleneck.

At my previous employer, they did have one such program. Though I
didn't work on it personally, I did a detailed review of the date class
used. It employed a highly optimised date class (using ~100K of lookup
tables to allow indexed conversion from broken-down time to
elapsed-time-since-reference-date) and did billions of calculations
regarding dates. The company (Bloomberg) deals in securities and
financial data, where dates are clearly very important for maturation,
future dates, swaps etc..

> Operators +, -, +=, -= are sometimes seen with reasonable
> implementations appending sequences or combining sets.

I sort of disagree. For sequences in general, I strongly
disagree. In the case of set, I suspect that an acceptance of
operators like | or |= is based on the fact that there are two

Well, strings are an example of that grey area between containers and
value-semantic types were sequences are usefully and intuitively
combined using + and +=. (I see you accept though dislike the use of
such notations for sets...)

> > std::string is again an obvious example, though a bit cheeky
> > as it's clearly both a container and a value-semantic type.

> I have no problem with containers supporting certain aspects of
> value semantics. A text string, however, is NOT a container, at
> least not in the usual sense, in that it's contents have a
> semantic which is more than just a collection of individual
> elements.

I'm having trouble seeing this too. Going back to the contrast you
draw between double[2] and complex - which I find an instructive
perspective - I'd say that strings clearly fall in with double[2]. The
string doesn't differentiate between it's elements on any basis other
than order (hippip hurray for the end of NUL termination), and its
operations aren't meaningfully different from those on vectors.
Thinking about this, I find myself wondering why I wouldn't feel
comfortable using std::basic_string<> in place of std::vector<>, and I
can't think of a good reason other than the confusion it would cause -
not through any semantic differences, but just through general
expectation.

Best regards,

Tony

kanze

unread,

Oct 21, 2005, 12:27:13 PM10/21/05

Tony Delroy wrote:
> Hi James,

> > Requiring an object to be non-mutable except for assignment
> > has the distinct advantage (for the implementor) of allowing
> > reference counting.

> > Mutability causes no end of problems for an efficient
> > implementation of copy. Truncation, or extraction of
> > substrings in general, can also be made just as fast, if not
> > faster, if mutability isn't supported. (In the end, there
> > are only two operations you can do with a string: extract a
> > substring, and replace a (possibly empty) substring with
> > another (possibly empty) string. The first can be
> > implemented faster if the string doesn't support mutation.)

> Am I missing something here James? From our discussion to
> date, I can only assume you're discussing reference counted
> implementations of strings with at least similar behaviour to
> std::string. A mutating operation on an object with reference
> count "1" modifies in-place. Unmoved data need not be
> touched. For higher reference counts, whether mutating
> functions exist or not, a non-mutating application gets a new
> buffer and has to copy all desired data to there, then modify
> both reference counts. How can a non-mutating object ever be
> faster?

Globally, they're faster because none of the operations every
has to worry about the reference count. Extracting a substring
is faster for the simple reason that it can still use the shared
implementation. (Most of the COW implementations I've actually
seen only share when there is an exact copy, but in fact, it
isn't to hard to share common subsequences. This becomes
significantly more difficult if modification is allowed. Note
too that the most effective way of sharing uses garbage
collection, and not reference counting.)

> The only situation I can envisage is for multi-threaded
> applications with an internally thread-safe string
> implementation, which is very different from using mutexes
> around operations on an STL-like string. This goes back to
> the old question of why STL containers aren't inherently
> thread-safe, and all the old reasons remain valid.

Any manipulations of the shared representation do have to be
internally thread safe, since this is an implementation detail,
not visible to the user. In a multi-threaded environment, COW
only really works if you have garbage collection. And
immutable strings.

> Returning to the case where you acknowledge mutators are
> faster:

> > And how often are things like modifications in the middle of
> > a string more important than copy and extraction of a
> > substring.

> This is one of those old questions - someone posted recently
> with some statistics from their particular system. I wasn't
> interested enough to read the details. Still, unless you
> explain how mutating functions slow copy, truncation and
> extraction,

The effectively prevent shared implementations, which slows
copy, truncation (which is a form of extraction) and extraction
considerably.

Consider a typical application. You read a line (using
getline), and cut it up into tokens. With a good implementation
of a non-mutating string, the actual buffer for the line is
allocated once, and never modified. The strings representing
the tokens just point into this buffer. Getting this to work
with a mutating interface is a lot harder; getting it to work in
a multi-threaded environment is close to impossible. In all of
the std::string implementations I know, each of the tokens will
have its own buffer.

> I'll maintain that they don't have to be _more_ important.
> The discussion thread talks about the applicability of these
> practices to a variety of types, not strings specfically.

> Perhaps a better question for you is: are reference counts 1
> often enough for it to be very useful to do in-place mutation?

> > For the user, the main advantage [of non-mutable types] is
> > simply readability.

> I've already expressed my personally opinion that a notation
> like:

> my_date.set_to_first_date_in_month();

> is clearer than

> my_date = my_date.get_first_date_in_month();

> Sure the "=" has worth as a visual indication of changing
> value, but the redundant and separated use of "my_date"
> compromises the sense that the object is mutating, rather than
> being assigned a value independent of it's current value.

But it *is* being assigned a new value. The new value isn't
independant of the old value, any more than the new value in "x
= sin( x ) ;" is independant of the old value. But it is a new
value.

> Some bizarre notations are also possible, such as:

> my_date |= FIRST_DAY_OF_MONTH;

> which is easy enough to read, conveys assignment and
> in-place-modification, but may be less obviously available
> when browsing the date header, and doesn't benefit from
> editors providing drop-down member function lists.

It's also a lie, semantically speaking. IMHO, lying to the
reader isn't the best way to obtain readable code.

> Re robustness, batch systems can generally afford the kind of
> "one error, abort, fix and rerun" approach you speak of. Many
> other systems need to log and discard all ill affects of
> failed transactions and attempt future transactions with a
> clean slate.

It depends on why the transaction failed. If it failed because
of an error in the request, or some "normal" error (record not
found), then of course, you have to clean up and continue. If
it failed because the internal state of the program is corrupt,
then the only "correct" action is to abort and restart.

The key words in your statement are "clean slate". If you can
ensure a clean slate, by all means, continue. If you detect an
impossible program state, however, then by definition, you
cannot ensure a clean slate. So you have to start weighing the
risks of different solutions.

> This is quite typical of server systems - everything from
> operating systems to web servers and financial pricing
> systems.

Unless you can be 100% sure of starting with a perfectly clean
slate, the only case where I can imagine the slightest doubt is
for the OS.

A web server: a crash is not really a problem, the protocols are
designed to be robust, and the clients will pick up where they
left off when you return. By its very nature, a web server must
be able to shut down and restart without problems.

A financial pricing system: in NO case is it permitted to
continue unless you are 100% sure of the data. The (financial)
risk is just too great.

Even in an OS, think of the damage you can do if you start
working with bad data. Writing the wrong sectors on the disk,
etc.

I think, too, that you have to think in terms of the total
system. On all of the servers I've worked on, for example,
there have been watchdogs which would restart the application
(or switch over to a backup, on the critical systems) if it
stopped. (Under Unix, this can be done with a simple, three
line shell script:
while :
do application
done
.) Is it better for a web server, or any server, to try and
stumble one when it isn't sure of its state, or is it better for
the process to get out of the way as soon as possible, so its
replacement can get going?

> It's obviously more practical to be robust in this sense if
> the class of errors relates to bad run-time inputs, rather
> than memory exhaustion etc.. I accept what you say that it
> may be better to abort than continue after errors that are
> impractical to recover from.

The question is when it becomes impractical. If you can
encounter memory exhaustion as a result of a user request, and
aborting just that request recovers the memory, then it would be
an error not to continue. If the amount of memory used in a
single user request is bound, however, and about the only
possible way to encounter memory exhaustion is by continually
leaking memory, then I don't know how you can recover other than
by killing the process and restarting.

> Relating this back to my original observation, let me add the
> qualifier you crave, and state that if a "value-semantic" type
> decides not to throw, then to be usable in a software systems
> that must continue despite the errors encountered, is should
> set an error state that can be tested later.

No. You have to simply ensure that it cannot throw. For value
types, this is the case more often than not.

> > > > This is, of course, supposing that the abstraction
> > > > behind string is a text string -- if the abstraction is
> > > > just a collection of characters, other rules hold.

> > > You've lost me... can you elaborate please?

> > The question is simply whether String should be a collection
> > of characters, and fit into the STL, or whether it is a
> > "fundamental" type, a bit of text. The set of operations
> > you provide are radically different according to which
> > abstraction you provide. And trying to provide both just
> > results in a confused mess.

> I'll take your word for it, never having heard of anyone who
> wanted such a "fundamental" text type, and being unable to
> think of a situation where it would have been better suited
> than the char[], vector<char> and STL string-style
> alternatives.

It depends on what levely you're working at. The problem in
general is that most people have never had access to a text
type, and do think of strings as a collection of characters.
(Another problem is that this doesn't actually cause too many
problems in English. When you start having to deal with things
like diacritics, position dependant representations, or cases
where there is no one to one mapping between small letters and
capitals, on the other hand, then arrays of characters are a
much lower level than you want in most of the program.)

> I've even seen strings designed for interoperability with
> fixed-length FORTRAN strings, but they all provided some form
> of access to individual characters, so don't match your
> "fundamental" notion.

> > > There can be performance implications.

> > Fine. When the profiler says you have to do something about
> > it, you do so. I've yet to see a program where date
> > manipulations were the bottleneck.

> At my previous employer, they did have one such program.
> Though I didn't work on it personally, I did a detailed review
> of the date class used. It employed a highly optimised date
> class (using ~100K of lookup tables to allow indexed
> conversion from broken-down time to
> elapsed-time-since-reference-date) and did billions of
> calculations regarding dates. The company (Bloomberg) deals
> in securities and financial data, where dates are clearly very
> important for maturation, future dates, swaps etc..

We seem to be able to get by with a Posix time_t for all that,
but I sometimes get the feeling that we aren't really doing as
well as we should be.

With regards to mutating versus non-mutating, did you really
have that many cases where you didn't have to keep the original
date as well? And if you have to keep the original date,
mutating can't be any faster, and will typically be slightly
slower, than non-mutating.

But of course, I don't doubt that there will be cases where the
profiler does require mutation. In such cases, you do it. If
you have to, you have to.

> > Operators +, -, +=, -= are sometimes seen with reasonable
> > implementations appending sequences or combining sets.

> I sort of disagree. For sequences in general, I strongly
> disagree. In the case of set, I suspect that an acceptance of
> operators like | or |= is based on the fact that there are two

> Well, strings are an example of that grey area between
> containers and value-semantic types were sequences are
> usefully and intuitively combined using + and +=. (I see you
> accept though dislike the use of such notations for sets...)

Well...

>From a certain point of view, I don't even like + for strings.
On the other hand, given a type that supports +, and supports
assignment, I think that += is a necessity.

But += can (and should) be defined in terms of = and of +. And
in my pre-standard string, it was implemented in terms of
assignment and append (which was a const function returning a
new String).

> > > std::string is again an obvious example, though a bit
> > > cheeky as it's clearly both a container and a
> > > value-semantic type.

> > I have no problem with containers supporting certain aspects
> > of value semantics. A text string, however, is NOT a
> > container, at least not in the usual sense, in that it's
> > contents have a semantic which is more than just a
> > collection of individual elements.

> I'm having trouble seeing this too. Going back to the
> contrast you draw between double[2] and complex - which I find
> an instructive perspective - I'd say that strings clearly fall
> in with double[2].

It's arguable. It depends on what you are doing. But think of
things like toUpper, when there isn't a one to one mapping
between upper and lower case.

And if strings are just arrays, what's wrong with
std::vector<char>. In fact, that's what I regularly use for
low-level operations anyway. Because with vector<char>, I have
a number of complexity guarantees that I don't have with
string. (As far as I can tell, with string, the only complexity
guarantee you have is that the implementation is going to be
reasonably complex:-).)

> The string doesn't differentiate between it's elements on any
> basis other than order (hippip hurray for the end of NUL
> termination),

std::string doesn't (and is really nothing but a poorly
specified container). Whether an abstraction for strings should
is a different question. If we're talking about text, then the
individual characters do have semantics, and the individual
char's, potentially even more.

> and its operations aren't meaningfully different from those on
> vectors.

Again, this is a characteristic of std::string, but not
necessarily a good one. (There is some argument that it is
probably all that can reasonably be done in a standard today,
that our knowledge of what text really is isn't yet mature
enough for standardization.)

> Thinking about this, I find myself wondering why I wouldn't
> feel comfortable using std::basic_string<> in place of
> std::vector<>, and I can't think of a good reason other than
> the confusion it would cause - not through any semantic
> differences, but just through general expectation.

General expectation is the only reason I use std::string,
instead of std::vector<char>:-). It *is* good documentation.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Tony Delroy

unread,

Oct 25, 2005, 11:55:17 AM10/25/05

Hi James,

Yikes - this discussion has openned up on so many fronts I don't know
where to begin...!

You mention some contexts in which immutability is a genuine advantage
- thanks for your insights on that front. The "parsing an input line"
example was helpful. I guess it becomes yet another question of
relative frequencies of particular operations. The immutable string
idea does have costs for many operations when the reference count is 1
anyway, is lazier in both allocating and deallocating memory, and has a
hit when calling c_str() on referenced substrings.

As for mutating operations, it's simply a matter of perspective.
Languages exist that only have non-mutating values (IIRC Miranda and
Prolog are two), so certainly that way of thinking is natural for some.

Still, operators like += just highlight the fact that mutability is the
normal perspective in C/C++. C++ programmers are used to, comfortable,
and effective working with this idea. You mention x = sin(x) as an
example where .set_to_sin() would be invalid, and I agree. But the
reason is simply that there are a great many such operations, they're
applicable to multiple types (float, double), they involve indivisuble
types (see below), and the C++ perspective is compromised by an
overriding expectation for mathematical notation.

Surely you wouldn't advocate "my_map = my_map.erase(key);"? So where's
the line? One factor is divisibility (I alluded to this by the name
"atomicity" earlier in this discussion, but have changed to avoid
further confussion). We're talking about value-semantic classes, and
I'd suggest that mutating operations become less and less compelling as
the type becomes more and more indivisible. Dates can still be
perceived as day/month/year values, strings as sequences of characters,
but a class like "Euros" may be wrap an indivisible value.

Another factor is convenience, and mirroring the thought processes that
are natural in the problem domain. If people think of "advancing a
date to the next Friday", then my_date.advance_to_next(FRIDAY);
captures this succinctly. If they think of "set the date to the Friday
following it's current value" then perhaps my_date = advance(my_date,
FRIDAY); Similarly, .set_to_first_day_of_month() vs my_date = Date(1,
my_date.get_month(), my_date.get_year());. Programmatic abstraction
should mirror convenient, natural but logically sound operations in the
problem domain.

Re robustness: these high-level discussions always seem to get bogged
down, as there are too many assumptions. For example, I understand
your general reasoning re financial systems needing to stop when an
error makes the legitimacy of future operations uncertain, but they
certainly do continue after numerous types of errors. For example, if
the system I'm working on now is unable to give an automatic quote then
it marks the request as needing a manual quote and continues. They
next request may or may not be able to be automatically priced. It's
likely that if we were actually discussing tangible business systems
then we'd agree as to the areas in which it can continue, and where it
should abort.

Re string vs text, I'll leave that to others as my only experience of
supporting multiple languages is too limited. It seems inevitable that
those working routinely with multi-byte character sets should help
progress these issues.

Cheers,

Tony

kanze

unread,

Oct 26, 2005, 7:15:31 PM10/26/05

Tony Delroy wrote:

[...]

> As for mutating operations, it's simply a matter of
> perspective. Languages exist that only have non-mutating
> values (IIRC Miranda and Prolog are two), so certainly that
> way of thinking is natural for some.

Technically, C (and C++ for the built-in types) has non-mutating
values:-). 5 never becomes 6. I think you mean that variables
are non mutating. The value in a variable doesn't change
itself; it gets replaced by a new value, as the variable takes
on several different values in time.

I wonder if there isn't a slight ambiguity in the meaning of
"change" here. If I say the value of i changes, I mean that I
have replaced one value with another, not that I have somehow
mutated an existing value. It's a subtle distinction, but I
think it is important in what follows.

> Still, operators like += just highlight the fact that
> mutability is the normal perspective in C/C++.

Or not. For the base types, these operators are formally
defined in terms of = -- not modification of an existing value,
but calculation of a new value, then replacement of the value.
Because of the value semantics, there may even be type
conversions, in += of two shorts, the + operation is still on
int's, with an int result which is then reconverted to short.

That is, of course, the formal definition. How programmers
actual "feel" the operators may be different. And of course,
user defined += can do just about anything, and can certainly
modify value.

> C++ programmers are used to, comfortable, and effective
> working with this idea. You mention x = sin(x) as an example
> where .set_to_sin() would be invalid, and I agree. But the
> reason is simply that there are a great many such operations,
> they're applicable to multiple types (float, double), they
> involve indivisuble types (see below), and the C++ perspective
> is compromised by an overriding expectation for mathematical
> notation.

I think that the mathematical notation is an overriding
principle here. And IMHO, true value types define, in each
case, a type of mathematics. However...

> Surely you wouldn't advocate "my_map = my_map.erase(key);"?
> So where's the line? One factor is divisibility (I alluded to
> this by the name "atomicity" earlier in this discussion, but
> have changed to avoid further confussion).

Good move:-). I think that this is the key.

One could imagine something like aDouble.setExposant(), as a
modifying function. If we reject the idea, I think it is
principally because the abstraction normally associated with a
double doesn't break the object up into its component parts;
when you think of a double, you think of it as an indivisible
value, not as a set of a sign, an exponant and a mantissa. On
the other hand, what is the abstraction of a vector, except an
ordered collection of elements. In one case, the "value" is
indivisible, in the other, it is visibly made up of its
component parts.

There is, of course, also a question of performance. I once
used a double linked list to implement what was basically a set
of DN (which was itself a set of attribute value assertions,
generally containing as string). Logically, it was a value, and
I returned it from one function by value, using deep copy. And
saw the program stop for about 15 seconds, as the set contained
some 60000 elements:-). It didn't really matter -- the function
itself took something like an hour to execute, but the pause was
visible (there was a progress indicator).

> We're talking about value-semantic classes, and I'd suggest
> that mutating operations become less and less compelling as
> the type becomes more and more indivisible. Dates can still
> be perceived as day/month/year values, strings as sequences of
> characters, but a class like "Euros" may be wrap an
> indivisible value.

I think we're pretty much in agreement here. The question is
the abstraction. If the abstraction for date is simply a point
in time, and for Euros a number of cents and of Euros, then the
situation would be the opposite. While this is probably a
ridiculous abstraction for Euros, I find that it generally is a
preferrable one for date -- it abstracts away the question of
whether I want day/month/year or yearday/year, and even which
calendar system I'm using. The separation into day/month/year
is one concerning external representation only.

In the end, the real question regarding string is what the
abstraction should be. The std::string is really a very low
level abstraction; it IS just a collection of characters -- not
even characters, really, just integers. For many programs, a
higher level abstraction would be more appropriate, an
abstraction which knew what a character was, and treated strings
as more than just a sequence of characters, or at least more
than just a sequence of bytes.

> Another factor is convenience, and mirroring the thought
> processes that are natural in the problem domain. If people
> think of "advancing a date to the next Friday", then
> my_date.advance_to_next(FRIDAY); captures this succinctly. If
> they think of "set the date to the Friday following it's
> current value" then perhaps my_date = advance(my_date,
> FRIDAY); Similarly, .set_to_first_day_of_month() vs my_date =
> Date(1, my_date.get_month(), my_date.get_year());.
> Programmatic abstraction should mirror convenient, natural but
> logically sound operations in the problem domain.

I like mathematical rigor, even for things like date. I find
that immutability models more the mathematical way of thinking
of things.

> Re robustness: these high-level discussions always seem to get
> bogged down, as there are too many assumptions. For example,
> I understand your general reasoning re financial systems
> needing to stop when an error makes the legitimacy of future
> operations uncertain, but they certainly do continue after
> numerous types of errors.

For a specific definition of error. My banker thinks that it's
an error if the balance on my account is negative, but nobody
thinks that the bank's accounting system should shut down
because of it.

At the most abstract level, a system does useful work based on
an abstraction. If that abstraction doesn't hold, it cannot do
useful work, and should say so and stop. If the bank's
abstraction of an account doesn't provide for negative balances,
then shutting down the system is probably the correct thing to
do. In this case, however, I would argue that this means that
the bank used an inadequate abstraction.

> For example, if the system I'm working on now is unable to
> give an automatic quote then it marks the request as needing a
> manual quote and continues. They next request may or may not
> be able to be automatically priced. It's likely that if we
> were actually discussing tangible business systems then we'd
> agree as to the areas in which it can continue, and where it
> should abort.

Agreed. My tendancy would be that an underlying abstraction
that supposes enough memory is an acceptable one -- if there
isn't enough, then run the program on a system with more. But
if the program is leaking, there will never be enough; it's a
programming error, and should probably lead to an abrupt
termination. Note that on most systems, if the insufficient
memory is detected when the system is trying to increase the
size of the call stack, it will lead to abrupt termination.
Whether that is what is wanted or not. In far too many cases,
there just isn't anything reasonable you can do if you don't
have enough memory.

And of course, treating insufficient memory as a fatal error,
leading to program termination, rather than a recoverable error
triggering an exception, means that all value types will almost
automatically have non-throwing constructors and assignment
operators.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Tony Delroy

unread,

Oct 27, 2005, 6:41:01 AM10/27/05

The contrast with Miranda and Prolog seems to have fallen flat, perhaps
due to my careless wording. My recollection (from very many years ago)
is that, in C++ terms, all variables are const in these languages. You
can't change/replace their values: only define new variables. This is
not as impractical as it may sound to a C++ programmer. It affects the
way programs are structured - for example, recursive functions might
replace iteration where a variable is mutated.

> And of course, treating insufficient memory as a fatal error,
> leading to program termination, rather than a recoverable error
> triggering an exception, means that all value types will almost
> automatically have non-throwing constructors and assignment
> operators.

I still can't agree with this. For example, an attempt may be made to
initialise a date using day, month and year values that aren't, in
combination, valid. (Whether its storage is indivisible, the
perception of divisible vs indivisible, mutators etc isn't relevant to
the following discussion). We have four possible approaches to
handling such an situation:

1) program termination
2) expect a prior call to a function that validates the combination
(note: this prior call probably does 90% of the work required
immediately after when the date is actually constructed, assuming
storage as units of time since reference)
3) throw an exception
4) leave the date in a state that can later be tested for validity

Consideration:

1) Something like a Date class has potential wide-spread applicability,
and termination won't be appropriate in many cases where they would be
useful, so this approach has to be excluded.
2) An insistence on prior verification (a typical DbC approach) -
separate from construction - is error prone (in the sense of easily
overlooked or botched by the programmer), paritcularly verbose, and
inefficient.
3) Generally practical, and familiar through various STL behaviours,
though programmers may forget to put catches in the appropriate places.
4) Generally practical, and familiar through innumerable C and C++
library behaviours, though programmers may forget to test for error
status. Reliable testing may be facilitated by having an operator
bool() const, though this has the usual issues associated with implicit
conversion operators. Things would be a little easier if C++ allowed a
notation like "if (! Date date(31, 2, 2006))", where date was scoped as
if replacing the if statement, or even a more perl-like "Date date(31,
2, 2006).valid() || { cout << "error\n"; return; }. But I'm
day-dreaming. C++ allows a temporary ala "if (Date(31, 2, 2006))", but
it's not much help unless you want to abuse it as per 2) "if (!Date(x,
y, z)) return; Date date(x, y, z);".

Anyway, yet again I assert that 3) and 4) are _both_ reasonable choices
for any value-semantic type whose complexity is such that construction
or assignment can fail, but in such a way that doesn't compromise the
ongoing viability of the process's execution. Sure, if a type is
simple enough that its construction and operators can be designed to
succeed, then terrific: the type belongs to an even simpler subset of
value-semantic types that perhaps is what Roland had in mind to begin
with, in which case I still have no terminology to suggest. But for
the rest of value-semantic types, a crucial cost implicit in choosing
value semantics is that types return their value (perhaps via a
reference to self) rather than a success/failure indicator, and this
forces one of the choices above. Hopefully by now the larger scope
implied by the term "value-semantic type" is clear (if I've even got it
right, which I said was uncertain when introducing the term in this
thread), though I've never said it was an exact match for Roland's
specification.

- Tony

kanze

unread,

Oct 28, 2005, 4:30:20 AM10/28/05

Tony Delroy wrote:

> The contrast with Miranda and Prolog seems to have fallen
> flat, perhaps due to my careless wording. My recollection
> (from very many years ago) is that, in C++ terms, all
> variables are const in these languages. You can't
> change/replace their values: only define new variables. This
> is not as impractical as it may sound to a C++ programmer. It
> affects the way programs are structured - for example,
> recursive functions might replace iteration where a variable
> is mutated.

That's about what I understood. I didn't follow up on it
because truly unmodifiable variables, as in such languages, more
or less only works when you do use recursion even for simple
loops. Which more or less implies a functional language, which
in turn means that it isn't relevant to C++.

> > And of course, treating insufficient memory as a fatal
> > error, leading to program termination, rather than a
> > recoverable error triggering an exception, means that all
> > value types will almost automatically have non-throwing
> > constructors and assignment operators.

> I still can't agree with this. For example, an attempt may be
> made to initialise a date using day, month and year values
> that aren't, in combination, valid. (Whether its storage is
> indivisible, the perception of divisible vs indivisible,
> mutators etc isn't relevant to the following discussion). We
> have four possible approaches to handling such an situation:

> 1) program termination
> 2) expect a prior call to a function that validates the combination
> (note: this prior call probably does 90% of the work required
> immediately after when the date is actually constructed, assuming
> storage as units of time since reference)
> 3) throw an exception
> 4) leave the date in a state that can later be tested for validity

> Consideration:

> 1) Something like a Date class has potential wide-spread
> applicability, and termination won't be appropriate in many
> cases where they would be useful, so this approach has to be
> excluded.

Agreed unless you require the prior call.

> 2) An insistence on prior verification (a typical DbC
> approach) - separate from construction - is error prone (in
> the sense of easily overlooked or botched by the programmer),
> paritcularly verbose, and inefficient.

It depends. I don't see too much where it is error prone.
There may be efficiency issues, but it's hard to say which way
they lean. If the parameters comes from user data, errors will
be frequent (relatively), and being able to handle the error on
the return code, without an exception. And exceptions are very
expensive. And if the parameters are calculated internally, you
may be sure that they are correct, and skip the extra function;
if the profiler shows it necessary, you may even skip the
asserts in the constructor, and gain more.

It's not a universal solution, but it's probably more frequently
appropriate than you suggest.

> 3) Generally practical, and familiar through various STL
> behaviours, though programmers may forget to put catches in
> the appropriate places.

Where in the STL does anything throw an exception because of a
violation of pre-conditions? About the only exception you'll
ever get from the STL is bad_alloc.

I'd say that overall, the STL policy is 2, but without the prior
validation functions. If you don't have correct parameters,
it's undefined behavior. About the only difference with regards
to the STL is that the STL doesn't provide any means of
validating them before hand.

> 4) Generally practical, and familiar through innumerable C and
> C++ library behaviours, though programmers may forget to test
> for error status. Reliable testing may be facilitated by
> having an operator bool() const, though this has the usual
> issues associated with implicit conversion operators. Things
> would be a little easier if C++ allowed a notation like "if (!
> Date date(31, 2, 2006))", where date was scoped as if
> replacing the if statement, or even a more perl-like "Date
> date(31, 2, 2006).valid() || { cout << "error\n"; return; }.
> But I'm day-dreaming. C++ allows a temporary ala "if
> (Date(31, 2, 2006))", but it's not much help unless you want
> to abuse it as per 2) "if (!Date(x, y, z)) return; Date
> date(x, y, z);".

> Anyway, yet again I assert that 3) and 4) are _both_
> reasonable choices for any value-semantic type whose
> complexity is such that construction or assignment can fail,
> but in such a way that doesn't compromise the ongoing
> viability of the process's execution.

It depends on the application. I won't say that they are never
appropriate. But they are probably over used. (Note that all
Roland asked for was a *default* constructor, copy and
assignment to be non-throwing. Which would be trivial for your
Data example, since the only possible errors are those due to
bad parameters.)

> Sure, if a type is simple enough that its construction and
> operators can be designed to succeed, then terrific: the type
> belongs to an even simpler subset of value-semantic types that
> perhaps is what Roland had in mind to begin with, in which
> case I still have no terminology to suggest. But for the rest
> of value-semantic types, a crucial cost implicit in choosing
> value semantics is that types return their value (perhaps via
> a reference to self) rather than a success/failure indicator,
> and this forces one of the choices above. Hopefully by now
> the larger scope implied by the term "value-semantic type" is
> clear (if I've even got it right, which I said was uncertain
> when introducing the term in this thread), though I've never
> said it was an exact match for Roland's specification.

OK. I'm willing to admit that there may be cases...

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Valentin Samko

unread,

Oct 29, 2005, 4:13:29 PM10/29/05

kanze wrote:

>> 3) Generally practical, and familiar through various STL
>> behaviours, though programmers may forget to put catches in
>> the appropriate places.
>
> Where in the STL does anything throw an exception because of a
> violation of pre-conditions? About the only exception you'll
> ever get from the STL is bad_alloc.

May std::basic_string members (erase, insert, constructor, at, ...) throw
exceptions if
pre-conditions are violated. Many bitset members also throw. Also there's
vector::at.

Valentin Samko - http://www.valentinsamko.com

Tony Delroy

unread,

Nov 3, 2005, 7:58:33 AM11/3/05

James wrote:

> (Note that all Roland asked for was a *default* constructor, copy and assignment to be non-throwing. Which would be trivial for your Date

example, since the only possible errors are those due to bad parameters.)

Fair point... had forgotten that aspect during the conversation ;-).

I won't launch a DbC vs DP discussion in this thread... perhaps another
time.

Cheers,

Tony

0 new messages