Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Immutable data structures - performance && thoughts

152 views
Skip to first unread message

Marinca Gheorghe

unread,
May 12, 2013, 2:31:22 PM5/12/13
to
Hi all,

I recently looked over how a C# library could implement an immutable
stack, and came up with this implementation in C++. I wonder here
mostly about the overhead of std::shared_pointer(and the missing of a
garbage collector that probably could help for this kinds of
structures)

namespace immutable
{
template<typename T>
class stack: public std::enable_shared_from_this<stack<T>>
{
public:

typedef std::shared_ptr<T> headPtr;
typedef std::shared_ptr<stack<T>> StackPtr;

private:
template <typename T> friend struct stackBuilder;

public:

static StackPtr empty()
{
return std::make_shared<stackBuilder<T>>(nullptr, nullptr);
}

static StackPtr Create()
{
return empty();
}

StackPtr push(const T& head)
{
return std::make_shared<stackBuilder<T>>(std::make_shared<T>(head),
shared_from_this());
}

StackPtr pop()
{
return this->tail;
}

headPtr peek()
{
return this->head;
}

bool isEmpty()
{
return (this->head == nullptr);
}

private:
stack(headPtr head, StackPtr tail): head(head), tail(tail)
{
}

stack<T>& operator= (const stack<T>& other);

private:
headPtr head;
StackPtr tail;
};


template <typename T>
struct stackBuilder: public stack<T>
{
stackBuilder(headPtr head, StackPtr tail): stack(head, tail){}
};
}

An usage would be like this:
auto empty = stack<int>::empty();

auto newStack = empty->push(1);
auto stack = newStack;
while(!stack->isEmpty()) //iterate over all elements
{
std::cout << *stack->peek() << "\n";
stack = stack->pop();
}

How would you think on implementing a general purpose stack in C++ ?


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Wil Evers

unread,
May 13, 2013, 1:10:27 PM5/13/13
to
[snip]
> }
[snip]

> How would you think on implementing a general purpose stack in C++ ?

That's an interesting question. A few remarks:

* Standard C++ already has a stack class template:

template <typename T, typename Container = std::deque<T>>
class stack;

When instantiated, one specifies the element type, and, optionally,
the type of the underlying container it uses. Of course, it provides
the usual stack operations (empty(), push(), pop(), top()). If at all
possible, especially when working with other programmers, you should
prefer the use of a standard class template over rolling your own.

* That said, a std::stack<T> is not immutable by design, while yours
is, which enables the use of a shared representation. Such a design
is a common idiom for simulating proper value semantics in
reference-based languages such as Java, and C++ is not such a
language. To elaborate: a std::stack<T> is optimized for the
efficiency of its core operations (push(), pop(), top()), while yours
allows us to replace the copying of an entire stack with the copying
of a single shared_ptr to it. Is that what you had in mind?

* While your stack objects are immutable, none of its non-static
member functions are labeled as 'const' to signal that they will not
change the state of the stack. C++ programmers routinely use
references (or pointers) to const objects when they do not wish to
modify their state. They'll be surprised to discover that they cannot
even call your isEmpty() or peek() when they do so.

* If you want to be able to cheaply copy entire stacks within the
framework of C++'s usual value semantics, you may want to consider
separating the stack type from the node type it uses for storing its
elements. For example:

template <typename T>
struct stack_node;

template <typename T>
class stack { // mutable

public :
stack()
: first_node()
{ }
bool empty() const
{ return first_node != nullptr; }
const T& top() const
{ return first_node->value; }
void push(const T& value)
{ first_node = std::make_shared<stack_node>(value, *this); }
void pop()
{ first_node = first_node->rest.first_node; }

private :
std::shared_ptr<const stack_node<T>> first_node;
};

template <typename T>
struct stack_node { // immutable, shared between stacks

stack_node(const T& value, const stack<T>& rest)
: value(value), rest(rest)
{ }
const T value;
const stack<T> rest;
};

Please note that stack_node<T> does not use a shared_ptr<T> to store
its T value; instead, it directly embeds it, without using dynamic
allocation.

* Finally, my experience with copying shared_ptr instances suggests
that it does have a significant performance impact, comparable to that
of running a garbage collector. You should probably take that with a
grain of sand though, because I don't have any hard data to back that
up. On the other hand, I do think this is one of the reasons why
blindly using shared_ptr to emulate the behaviour of reference-based
languages is generally frowned upon in the C++ community.

Hope this helps,

- Wil

SG

unread,
May 13, 2013, 2:21:29 PM5/13/13
to
On May 12, 8:31 pm, Marinca Gheorghe wrote:
> I recently looked over how a C# library could implement an immutable
> stack, and came up with this implementation in C++.

I think the appeal to "immutable datastructures" in languages that
tend to force a level of indirection onto you (well, C# also has
"value types" (to use their terminology) that don't do this) is higher
than in languages like C++ where you only get an indirection when you
ask for it and where class-type objects can be held directly just like
int values in varibles. It seems to me that the idea of "immutable
datastructures" is a way to combine sharing with what we call in the C+
+ community "value semantics". If you have a handle to an object but
the object will never change its state then it does not really make a
difference whether you think of the handle as some kind of pointer or
actually a variable storing the object itself because sharing the same
object is free of surprizes in that case. AFAIK C# and Java do this
with their respective string type. You can treat the reference to a
string as the string itself and copy it efficiently (share the
immutable object).

In C++ you have lots of options to go about organizing your custom
data structure. Value semantics is encouraged. But that does not stop
you from implementing certain things using techniques such as COW
(copy-on-write). In your case the use of std::shared_ptr makes things
easy. But there is an issue I see in combination with linked lists:

struct int_stack_node {
int value;
shared_ptr<int_stack_node> below;
};

class int_stack {
public:
bool empty() const {return !top;}
int top() const;
void push(int value);
void pop();
private:
shared_ptr<int_stack_node> top;
};

void int_stack::push(int value) {
auto sp = make_shared<int_stack_node>();
sp->value = value;
sp->below = move(top);
top = move(sp);
}

void int_stack::pop() {
assert(!empty());
top = top->below;
}

Here, when an int_stack object is destructed it could lead to a lot of
recursive invocations of int_stack_node destructors. So, there is the
danger of a stack overflow.

You might want to think about writing your own smart pointer "cow_ptr"
that provides read-only as well as read/write access like this:

cow_ptr<int> p (construct, 23); // tagged forwarding constructor
cow_ptr<int> q = p;
cout << *p << endl; // read-only access, prints 23
cout << *q << endl; // read-only access, prints 23
assert(&*p == &*q); // still sharing
p.write() = 42; // write-access
cout << *p << endl; // read-only access, prints 42
cout << *q << endl; // read-only access, prints 23
assert(&*p != &*q); // not sharing anymore

The beauty of COW is that you can combine sharing with "value
semantics". If there is no sharing going on, there is no problem of
modifying your list nodes' values, for example. COW made some of Sean
Parents code he showed in his "Value Semantics and Concept-Based
Polymorphism" talk really simple. It's easy to reason about due to
value semantics and efficient due to sharing -- at least if you like
to keep a history of different states allowing the user to press an
UNDO button etc. (You'll find it on Youtube.)

> I wonder here
> mostly about the overhead of std::shared_pointer(and the missing of a
> garbage collector that probably could help for this kinds of
> structures)

A garbage collector (instead of shared_ptr) would solve the potential
stack overflow problem here. But it also opens up other issues.
Why the indirection for the actual T values? It does not seem
necessary. Also, peek should not return a pointer with which the user
could violate the immutability. This breaks value semantics.

> How would you think on implementing a general purpose stack in C++ ?

I'd prefer a vector-based stack. And if it turns out that for some
specific application the cost of copying vectors is too big, and move
semantics does not help much, I'd think about writing something like
this:

template<class T>
struct scs_node
{
vector<T> segment_data;
cow_ptr<scs_node> below;
};

template<class T>
class segmented_cow_stack
{
public:
...

...
private:
cow_ptr<scs_node> top;
};

(assuming cow_ptr is a smart pointer described earlier that is move-
enabled to keep the reference counter values small to avoid
unnecessary copying).

There is actually no harm in supporting mutation for "value semantics"-
based types. You don't have to create new segmented_cow_stack<>
objects if you don't want to. But you CAN copy them and then it's like
they don't share their data because mutating one won't have any
observable effect on the other stacks. So, it's not really necessary
to stick to the interface that Eric Lippert suggested in his C#
article.

Cheers!
SG
0 new messages