N00b alert! - A question on immutable/mutable data structures...

67 views
Skip to first unread message

Trevor

unread,
Sep 13, 2011, 7:00:42 PM9/13/11
to Clojure
For the most part, I *believe* I understand why immutable data
structures with transactions are important to manage concurrent
operations to shared data, but I often wonder why it matters in some
cases...

For example, what if I have a hash-map that needs to handle concurrent
changes to the data structure, but never needs to have concurrent
changes to a given piece of data (i.e a key/value pair). Wouldn't
there be value in being able to modify the data "in-place" without
making a copy, or needing to endure the overhead associated with STM?

And if what I am suggesting is reasonable how can I create a mutable
hash-map in Clojure and still use (mostly) the same access functions.

Notes:
* Yes, I have watched Rich's video on identity and state, but it
hasn't helped my understand the above scenario.
* I am not suggesting a hash-map data structure should support mixed
operations.

Thanks,
Trevor



Andy Fingerhut

unread,
Sep 13, 2011, 7:23:21 PM9/13/11
to clo...@googlegroups.com
To my knowledge, there are no built-in data structures that are mutable and that use the same access functions.

There are built-in data structures called transients that are mutable, and use almost the same hash functions.  Read the docs on transient, persistent!, conj!, etc.  Transient data structures are restricted to be modified from a single thread, to avoid concurrency issues.  Any attempt to modify a transient data structure from multiple threads is detected as an error at run time.

Note that even the immutable data structures do not make a copy of the entire data structure.  If the data structure is large, most of the data is shared between the old and new data structure.  See the tree example on this Wikipedia page for a small example, but I know I've seen elsewhere pictures of examples closer to Clojure's persistent data structure implementations:

http://en.wikipedia.org/wiki/Persistent_data_structure

Your scenario: "What if I have a hash-map that needs to handle concurrent changes to the data structure, but never needs to have concurrent changes to a given piece of data (i.e. a key/value pair)."

My question: How do you know, in advance, that it doesn't need to handle such concurrent changes?

If it is because you have multiple threads that will always modify disjoint subsets of the keys, then you could have a separate hash-map for each of those sets of keys (perhaps a transient), and modify each one from its own thread.  Readers of the hash map would then have to search for keys in multiple tables, or if it was easy to calculate in advance which table the key must be in, then only one table lookup is sufficient.

Andy


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Stuart Halloway

unread,
Sep 13, 2011, 8:05:20 PM9/13/11
to clo...@googlegroups.com
For example, what if I have a hash-map that needs to handle concurrent
changes to the data structure, but never needs to have concurrent
changes to a given piece of data (i.e a key/value pair). Wouldn't
there be value in being able to modify the data "in-place" without
making a copy, or needing to endure the overhead associated with STM?

Hi Trevor, 

You should endure the overhead of the STM if you need what the STM provides: coordination of activity across identities. But there are several things in Clojure that provide identities without STM.  In your example, you could place a map inside an atom, which provides atomic transitions between values, but no coordination.

Abandoning values entirely should be highly motivated by a specific need. If you're there, then Java's concurrent maps are good and appropriate.

Stuart Halloway
Clojure/core
http://clojure.com

Trevor

unread,
Sep 13, 2011, 9:10:39 PM9/13/11
to Clojure
Thanks for the quick responses.

I'll try to answer Andy's question: "How do you know, in advance, that
it doesn't need to handle such concurrent changes?" ... and at the
same time I will try to provide this example to Stuart, hoping I can
see how using a map inside an atom might work:

Let's say my users log into a web page and each user has a queue that
does stuff for them. Since the users id is unique and each user can
only be logged in from one session, when I use the user id as a key
within a hash-map then I know *well-enough* there will not be any
concurrent changes to that key value pair, particularly since enqueing
means each change is actually an just addition to the stack. -- So I
am picking queue's to make a point... -- Queue's are chosen over lists
as they are both constant time and fast. Being atomic is great, but
wouldn't making a copy of a queue and re-assembling it defeat the
purpose of using it?

So let's try this:

> (def user-queues* (atom (hash-map)))
#'project/user-queues*

> (swap! user-queues* assoc "user1" clojure.lang.PersistentQueue/EMPTY)
{"user1" #<PersistentQueue clojure.lang.PersistentQueue@0>}

> (@user-queues* "user1")
#<PersistentQueue clojure.lang.PersistentQueue@0>

I would like to add an item to the users queue, but it seems when
using an atom I can only swap in and out the value as opposed to
modifying the value in-place.

So let's start with just the basic atom'd queue:

> (def q (atom clojure.lang.PersistentQueue/EMPTY))
#'project/q

> (swap! q conj (seconds))
#<PersistentQueue clojure.lang.PersistentQueue@d3232253>

> (apply list (swap! q conj (seconds)))
(1315961557 1315961570)

awesome.

Now I want to store each users queue in the user-queues* hash-map. How
would I do that while maintaining a real queue? My initial attempts
always lead to reading the full queue into a list, then to conj and
item on that list, then I have to remake a queue to then be stored
back into the map via swap....so it's at that point I might as well
not be using a queue - right?

Certainly hash-maps with queue's would be a reasonable idea - right?.

Thanks.














Stuart Campbell

unread,
Sep 14, 2011, 2:17:43 AM9/14/11
to clo...@googlegroups.com
Hi Trevor,

I hope I've understood your problem correctly.

You can modify nested structures using e.g. update-in:

  (let [k "user1" v 1234]
    (swap! user-queues update-in k conj v))

That's assuming that a user queue already exists in the map. If it doesn't, you could do something like:

  (let [k "user1" v 1234]
    (swap! user-queues #(assoc % k (conj (get % k clojure.lang.PersistentQueue/EMPTY) v))))

Regards,
Stuart

Meikel Brandmeyer (kotarak)

unread,
Sep 14, 2011, 2:22:26 AM9/14/11
to clo...@googlegroups.com
Or:

(swap! user-queues update-in [k] (fnil conj clojure.lang.PersistentQueue/EMPTY) v)

Sincerely
Meikel

Stuart Campbell

unread,
Sep 14, 2011, 2:31:19 AM9/14/11
to clo...@googlegroups.com
I knew there must be a nicer way to write that :)


--

Trevor

unread,
Sep 14, 2011, 7:42:42 AM9/14/11
to Clojure
ahh,... I see.
Thank you all - you've been very helpful.
Trevor
Reply all
Reply to author
Forward
0 new messages