Deep deref

19 views
Skip to first unread message

André Thieme

unread,
Nov 14, 2009, 11:42:57 AM11/14/09
to Clojure
I watched http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey
and it is a very nice video. Rich mentioned around minute 9 to 10:05 a
big problem:
when some code gets handed data, can it be sure this thing is
immutable?

Clojure wants to help, because it encapsulates state in refs, which we
can deref.
When we deref something it feels like an immutable copy which will
never be
touched.

But in real programs things are not so easy. We have refs in refs.
To illustrate this I want to make up a simple example. Let‘s say we
have a web
application where people can register, and become friends with each
other.

To represent a person we want to have a (defstruct
person :name :age :friends)
(this could also be a deftype of course).
We want to store all persons on our website in a map. Their name will
be the
key. As more and more people register we need to update our map, so it
must
be a ref (or atom):
(def *persons* (ref {}))

Each person can also be changed, for example when he/she gets older,
or when
she found new friends.
Initially when someone registers he/she has no friends yet. To add a
user we
want something like:
(defn make-person [name age]
(let [p (ref (struct person name age []))]
(dosync
(alter *persons* assoc name p))
p))

Now three people register, and we have:
(make-person "Karl" 20)
(make-person "Jeff" 22)
(make-person "Tina" 19)

Dereferencing *persons* will result in:
{"Tina" #<Ref@7ae6d: {:name "Tina", :age 19, :friends []}>,
"Jeff" #<Ref@125d92c: {:name "Jeff", :age 22, :friends []}>,
"Karl" #<Ref@14fa0ef: {:name "Karl", :age 20, :friends []}>}
Great so far.

People can become friends, so we need
(defn add-friend [#^String person #^String friend]
(dosync
(let [p (get @*persons* person)
f (get @*persons* friend)]
(alter p update-in [:friends] conj f))))

So, Tina can get the friends Karl and Jeff:
(add-friend "Tina" "Karl")
(add-friend "Tina" "Jeff")

Now the problem is: *persons* acts as our data base, and
when a request comes in (to our web application, hosted by,
say, Compojure) we want to see a consistent snapshot of the
DB.
But derefing *persons* won't do this for us.
Even when we deref Tina, we can not be sure who her friends
are, as those live in their own refs.
They have to. If Karl gets one year older then we only want to
change his date, and not traverse all persons first and see if
they have Karl as a friend and if yes updating his age there too.

How can we handle this situation?
Is it possible to implement a function “deep-deref” which works
as blindingly fast as deref does?
I find this very important, and this is of great practical relevance
for me. Please share your ideas.

John Harrop

unread,
Nov 14, 2009, 2:11:13 PM11/14/09
to clo...@googlegroups.com
I think the design in this sort of case needs changing.

Repeated structure references also pose a problem for storage and retrieval (at least if not using a real DB) as Clojure's prn and read will turn shared substructures into separate copies.

What's needed here is to make all person lookups go through one point, the *persons* table, so the :friends key would store a vector not of person struct references but simply of names.

When you start using a real DB this won't change; only the name of the concept will, to a "key column". Name will be a key column in the *persons* table and will be how you get at particular rows (persons) in the table.

John Harrop

unread,
Nov 14, 2009, 2:22:23 PM11/14/09
to clo...@googlegroups.com
Actually, often you have to go further. For example, in the example above, to accommodate name changes, the name field can't be the key field or you're back to having to update every reference. There's also the possibility of duplicate names for two different people; you don't want it to blow up (or worse, mix two peoples' private data and make it all visible to both) as soon as your Facebook clone's second John Smith signs up.

In situations like these, a computer-generated id is typically used as an immutable key field:

table person

id            name             age        friends
000           Karl             20         [002]
001           Jeff             22         [002]
002           Tina             19         [000 001]

(I took the liberty of assuming you actually want friends to be a symmetric relation, so add-friend will friend each person to the other. If not, it's still a viable example of using a numeric GUID (globally-unique identifier). Of course the UI can still display names rather than numbers in most situations (the URLs generated by your web framework will need URLs -- if you had http://myfacebookclone.com/profile/john_smith as a URL, which John Smith? Better http://myfacebookclone.com/profile/1653782 even if it's less human-understandable. Manually typing in URLs is rare user behavior anyway, if your page linkage structure is decent).

Danny Woods

unread,
Nov 14, 2009, 2:32:46 PM11/14/09
to Clojure
André Thieme wrote:
> How can we handle this situation?
> Is it possible to implement a function “deep-deref” which works
> as blindingly fast as deref does?
> I find this very important, and this is of great practical relevance
> for me. Please share your ideas

Hi André,

I had a similar issue with an application I wrote a while back. The
initial structure was simple: a grid-based map, containing a number of
entities, each entity having a name, an owner, dimensions, etc. The
map was behind a ref, and the entities were initially refs themselves,
embedded within the map structure. Then I realised that it was
difficult to know with certainty what was in the deref'd map.

It kind of struck me then that I was thinking about the problem
incorrectly. My object-oriented hat made me want to have the map and
entities as separate things, each in charge of its own state. When I
re-wrote the code so that the entities were themselves an immutable
part of an immutable map, and that changes to entities resulted in an
*entirely new map*, those problems went away. All the various
interactions with the UI and with the network updated the map and
entities transactionally, with a new, consistent map after each
alteration. What a relief that was!

In your case, adding someone to a friend list should result in what is
conceptually a completely new database, that shares almost all of its
structure with its predecessor. This top-level structure can sit
behind a ref, with immutable person structs as constituent parts.
This makes proper use of immutability and negates the need for a deep
deref.

That's just my experience. I'd be interested to know if anyone has an
honest need for refs within refs (especially in a larger system).

Cheers,
Danny.

André Thieme

unread,
Nov 14, 2009, 4:20:03 PM11/14/09
to Clojure
On 14 Nov., 20:22, John Harrop <jharrop...@gmail.com> wrote:
Well, I want to have a real in-ram db. Just not a relational one.
My in-ram data structures can express very well what I want to say.
I wrote a serialize and restore function that can print refs, even
circular
referential refs containing each other.


> > What's needed here is to make all person lookups go through one point, the
> > *persons* table, so the :friends key would store a vector not of person
> > struct references but simply of names.

Ok, so you would vote for having my own my-ref and my-deref functions,
which
references/dereferences my objects, using an ID under the hood.


> > When you start using a real DB this won't change; only the name of the
> > concept will, to a "key column". Name will be a key column in the *persons*
> > table and will be how you get at particular rows (persons) in the table.

I see your point, I just want to mention that we are maybe so trained
to use
relational db system that we see only them as a “real db” :-)


> Actually, often you have to go further. For example, in the example above,
> to accommodate name changes, the name field can't be the key field or you're
> back to having to update every reference. There's also the possibility of
> duplicate names for two different people; you don't want it to blow up (or
> worse, mix two peoples' private data and make it all visible to both) as
> soon as your Facebook clone's second John Smith signs up.
>
> In situations like these, a computer-generated id is typically used as an
> immutable key field:
>
> table person
>
> id            name             age        friends
> 000           Karl             20         [002]
> 001           Jeff             22         [002]
> 002           Tina             19         [000 001]

Okay, that makes sense. Basically the idea is having my own
ref and deref function, and then have a ref on a hashmap of a global
object container. Good, that would work and not change my programming
model, as the current one also requires the use of (deref ..) at
several
places.


> (I took the liberty of assuming you actually want friends to be a symmetric
> relation, so add-friend will friend each person to the other.

Yeah, that‘s fine. My example is just made up to illustrate the point.

Thanks for your idea, this seems to be the right way to do it.
So, now I need to look into how I can get @ to be extended to deref
(= run a function I specify) instances of the
(deftype my-ref ...) type.

André Thieme

unread,
Nov 14, 2009, 5:18:41 PM11/14/09
to Clojure
Danny, could you maybe hack up a possible very simple example of what
you mean?
I am not sure how complete my understanding of what you said is.


> That's just my experience.  I'd be interested to know if anyone has an
> honest need for refs within refs (especially in a larger system).

The idea behind my posting is a in-ram db system.
I don‘t want to use a relational db.
Now I have many objects, be it defstructs or deftypes. Those things
need
to be changable. And other objects must be able to reference them.
In the example I made up Tina has the friend Karl.
I can not simply store a copy of the Karl person in Tinas :friends
slot,
because when Karl gets one year older, then I don‘t want to go through
all persons, looking for Karls, and have their age updated.
Instead it is preferred to have
(defn update-age [#^String person new-age]
(dosync
(alter (get @*persons* person) assoc :age new-age)))

When we then (update-age "Karl" 25), then the Karl in Tinas :friends
slot will also be 25.

And each object needs to live in at least one index. In my example
this index is called *persons* and maps from the name to a person
instance.
There can be other indexes, for example *age-to-persons*, and when we
look up there who is 42 years old, then we will get a set of person
instances who are older.
These indexes need to be refs also, because we constantly want to put
new objects in there or delete them. So, refs in refs are really
interesting when you want to have your own in-ram db system.

Danny Woods

unread,
Nov 15, 2009, 4:05:09 AM11/15/09
to Clojure
> Danny, could you maybe hack up a possible very simple example of what
> you mean?

<snip>

> I can not simply store a copy of the Karl person in Tinas :friends
> slot,
> because when Karl gets one year older, then I don‘t want to go through
> all persons, looking for Karls, and have their age updated.

Hi André,

Your shared instance constraint does make things a little more
interesting than in my application, but I think John's suggestion
about not storing friend instances directly is a good thing. If you
were to instead assign an id to each created person, and have each
friend list be a list of ids, the entity referred to by the id can
still be immutable for a given dereference. Not sure how much sense
that makes, so here's something crude that I've hacked up:

(def *everyone* (ref {}))
(def *global-id* (ref 0))

(defn next-id [] (dosync (commute *global-id* inc)))

(defstruct person :name :friend-ids :id)

(defn make-person [name] (struct person name [] (next-id)))

(defn make-and-add-person [name]
(let [person (make-person name)]
(dosync (alter *everyone* assoc (:id person) person))))

(defn befriend [person friend]
(let [new-friend-list (conj (:friend-ids person) (:id friend))
new-person (assoc person :friend-ids new-friend-list)]
(dosync
(alter *everyone* assoc (:id person) new-person))))

The 'down side' here is simply that you refer to people by ids, rather
than directly, but the up side is that a given deref of *everyone* is
immutable and consistent.

Cheers,
Danny.

AndrewC.

unread,
Nov 17, 2009, 6:04:23 PM11/17/09
to Clojure
> In the example I made up Tina has the friend Karl.
> I can not simply store a copy of the Karl person in Tinas :friends
> slot,
> because when Karl gets one year older, then I don‘t want to go through
> all persons, looking for Karls, and have their age updated.
> Instead it is preferred to have
> (defn update-age [#^String person new-age]
>   (dosync
>     (alter (get @*persons* person) assoc :age new-age)))
>
> When we then (update-age "Karl" 25), then the Karl in Tinas :friends
> slot will also be 25.

I'm just watching Rich's Value/Identity/State lecture and it occurs to
me that the problem here stems from the fact that you do not have a
clear idea of what a person is.

Karl is still Karl, even when he gets a year older. Tina is still Tina
whether she is friends with Karl or not. Yet with your representation
this is not true. Karl at age 41 is not equal to Karl at age 42,
because you have his age has an intrinsic part of his identity.

I would suggest modelling a person as a set of properties that are
immutable (at least in the model), e.g. name and date of birth.

As others have suggested, I would suggest modelling the mutable
relationships between people and people (friendship) or between people
and other things (e.g. favourite food) as maps between immutable
people objects and other immutable objects. Not as mutable fields in
the person instances.

http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Message has been deleted

Sergey Didenko

unread,
Nov 23, 2009, 4:34:00 PM11/23/09
to clo...@googlegroups.com
Hi,

Andre, Danny's first approach is about "syncing" only on the root
object, so that every piece of data is behind one deref:

(def root (ref {:persons [ ... no other refs here... ]))

This approach is simpler to code but can lead to a lot of retried
transactions under heavy concurrent load, as I understand.

What about making all derefs that you need inside one transaction?
There are words somewhere in docs that you need dosync if you perform
a few reads that must be consistent. I think this is what you need.

Sergey Didenko

unread,
Nov 23, 2009, 4:46:17 PM11/23/09
to clo...@googlegroups.com
BTW I'm also coding the simple persistence for Clojure data
structures. Though I took Prevayler approach (http://prevayler.org),
so I journal function calls that change my root object.

This approach is better than simple snapshotting when your data grows
big, so you can't do the snapshots very often but do not want to lose
a lot of the latest changes.

Making snapshots are easy to implement after this, if data objects
reference each other only one time or through ids. I guess it would be
a simple traversal of the root object.

I plan to make this persistence code public if anyone is interested.
Though I'm not moving very fast, I'm learning Clojure at the same
time.

Wojciech Kaczmarek

unread,
Nov 23, 2009, 5:52:49 PM11/23/09
to clo...@googlegroups.com
Yeah it would be cool to look at it.

John Harrop

unread,
Nov 23, 2009, 5:56:10 PM11/23/09
to clo...@googlegroups.com
I'm starting to think that for some tasks Clojure could use a concept of "row locking" with maps.  It would mean having a map-of-refs type that was integrated with the STM, so multiple updates whose keys didn't collide could occur concurrently.

It *might* be possible to do this using commute and update-in, with a ref wrapping the whole map. The tricky thing is you want the update-ins to commute if the keys are not the same, but not if they are. Perhaps we need a "conditional commute" that takes two extra arguments, a value to test and a binary predicate. Then it could be done with (conditional-commute key = map update-in [key] val-transform-fn).

The idea here being, in this case, that if two of these were done in overlapping transactions, the first arguments would be compared using the second argument. If the result was true one transaction would be retried, if false the operations would commute. (If the second arguments to the two conditional-commutes differed the transaction would be retried.)

I think adding "conditional commute" would require changes to core, however.

Christophe Grand

unread,
Nov 24, 2009, 3:14:10 AM11/24/09
to clo...@googlegroups.com
On Mon, Nov 23, 2009 at 11:56 PM, John Harrop <jharr...@gmail.com> wrote:
I'm starting to think that for some tasks Clojure could use a concept of "row locking" with maps.  It would mean having a map-of-refs type that was integrated with the STM, so multiple updates whose keys didn't collide could occur concurrently.

It *might* be possible to do this using commute and update-in, with a ref wrapping the whole map. The tricky thing is you want the update-ins to commute if the keys are not the same, but not if they are. Perhaps we need a "conditional commute" that takes two extra arguments, a value to test and a binary predicate. Then it could be done with (conditional-commute key = map update-in [key] val-transform-fn).

The idea here being, in this case, that if two of these were done in overlapping transactions, the first arguments would be compared using the second argument. If the result was true one transaction would be retried, if false the operations would commute. (If the second arguments to the two conditional-commutes differed the transaction would be retried.)


I had a similar idea but more general: being able to specify invariants inside a transaction. Commit will procede only if the invariant still holds. Your proposed conditional-commute could be rewritten:

;; (conditional-commute key = map update-in [key] val-transform-fn)
(invariant (@map key))
(commute map update-in [key] val-transform-fn)

See the attached patch for a prototype.
invariants.patch

Krukow

unread,
Nov 24, 2009, 1:50:39 PM11/24/09
to Clojure


On Nov 14, 5:42 pm, André Thieme <splendidl...@googlemail.com> wrote:
> But in real programs things are not so easy. We have refs in refs.

This is just a thought experiment. But what about actually having refs
in refs? I'm not sure if I am reinventing mutable object here, so
please shoot me down ;-)

Clojure 1.1.0-alpha-SNAPSHOT
user=> (deftype Person [name age friends]
[clojure.lang.IPersistentMap])
#'user/Person
user=> (def *persons* (ref {}))
#'user/*persons*
user=> (dosync (alter *persons* assoc :ethel (ref (Person "ethel" 24
[]))))
{:ethel #<Ref@585976c2: #:Person{:name "ethel", :age 24, :friends []}
>}
user=> (dosync (alter *persons* assoc :fred (ref (Person "fred" 42
[]))))
{:fred #<Ref@2c78bc3b: #:Person{:name "fred", :age 42, :friends []}
>, :ethel #<Ref@585976c2: #:Person{:name "ethel", :age 24, :friends []}
>}
user=> @*persons*
{:fred #<Ref@2c78bc3b: #:Person{:name "fred", :age 42, :friends []}
>, :ethel #<Ref@585976c2: #:Person{:name "ethel", :age 24, :friends []}
>}

;;now...

user=> (dosync (alter (:fred @*persons*)
(fn [f] (assoc f :friends (conj (:friends f)
(:ethel @*persons*))))))
#:Person{:name "fred", :age 42, :friends [#<Ref@585976c2: #:Person
{:name "ethel", :age 24, :friends []}>]}
user=> @*persons*
{:fred #<Ref@2c78bc3b: #:Person{:name "fred", :age 42, :friends
[#<Ref@585976c2: #:Person{:name "ethel", :age 24, :friends []}>]}
>, :ethel #<Ref@585976c2: #:Person{:name "ethel", :age 24, :friends []}
>}
user=> (identical? (first (:friends @(:fred @*persons*))) (:ethel
@*persons*))
true

Note that Fred and Ethel are identities, and their corresponding refs
still refer to *immutable* values (the refs themselves).

Now to inc age:

user=> (dosync (alter (:ethel @*persons*) update-in [:age] inc))
#:Person{:name "ethel", :age 25, :friends []}
user=> @*persons*
{:fred #<Ref@da0225b: #:Person{:name "fred", :age 42, :friends
[#<Ref@7ae35bb7: #:Person{:name "ethel", :age 25, :friends []}>]}
>, :ethel #<Ref@7ae35bb7: #:Person{:name "ethel", :age 25, :friends []}
>}

In a way, I like the modeling: the people are actually identities and
there are relations between identities. STM ensures consistent
snapshots.

So am I reinventing objects in a bad way, or is this just recognizing
the true identities?

:-)

/K

Krukow

unread,
Nov 24, 2009, 2:07:53 PM11/24/09
to Clojure


On Nov 24, 7:50 pm, Krukow <karl.kru...@gmail.com> wrote:
> On Nov 14, 5:42 pm, André Thieme <splendidl...@googlemail.com> wrote:
>
> > But in real programs things are not so easy. We have refs in refs.
>
> This is just a thought experiment. But what about actually having refs
> in refs? I'm not sure if I am reinventing mutable object here, so
> please shoot me down ;-)
>
Please ignore :-( Just realized I didn't actually understand the
original post!

Rich Hickey

unread,
Nov 29, 2009, 9:05:27 AM11/29/09
to clo...@googlegroups.com
I had forgotten about this, could you please make an issue for it? I'd
like to look into it for a future feature.

Thanks,

Rich

Christophe Grand

unread,
Dec 1, 2009, 6:32:07 AM12/1/09
to clo...@googlegroups.com
Glad to hear that!
https://www.assembla.com/spaces/clojure/tickets/213-Invariants-and-the-STM


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en



--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.cgrand.net/ (en)
Reply all
Reply to author
Forward
0 new messages