Searching for an efficient ConstMap

11 views
Skip to first unread message

Corbin Simpson

unread,
Jun 10, 2016, 8:53:41 PM6/10/16
to e-lang
Hi everybody! I'm Corbin. I haven't posted here before. I'm working on Monte.

In Monte, we've explored efficient representations for our collections. The ConstMap has been a nemesis in this effort, because its list of requirements is very onerous:

* Persistent/immutable/COW/structually-sharing. Whatever you want to call it. Y'all know what I'm talking about.
* Iterable with a consistent and predictable iteration order based on insertion order. This is the hardest one, because it implies...
* It's fine to require hashable (settled) keys, but not sorted keys. And you can't rely on hashing order.
* Quick membership test/lookup. Amortized O(1) preferred, but I'll settle for O(log n).

I'm currently waffling between two options. The first option is cutting-edge HAMTs. The problem with HAMTs is that the hashing has to be consistent between vats if you want replay to work across vats. So we'd have to specify our hashing algorithm and be tied to it. Not a great option.

We could pair the HAMT with a deque-style doubly-linked list, instead of having consistent hashing. In this scheme, the deque is used to maintain key ordering, for iteration and push/pop, but the HAMT is used for membership and indexing. This complicates our immutability guarantee (because the deque must be mutable to be fast) and also increases our runtime for several operations, since we have to update two different structures. Also not a great option.

The other option I've been thinking about is ordered dictionaries, as in Python. Monte's built on the RPython toolchain, and we have an RPython-level ordered dictionary for free. It has the insertion-order behavior that we want, and it's quite speedy. Unfortunately, it's not at all persistent, so we have to do lots and lots of copies. It *is* a great FlexMap structure, but not a great ConstMap structure.

I've been reading a bunch but haven't found anything good, and the E docs don't indicate that a good solution was ever found. Has there been some research that I haven't yet found?

~ C.

Kevin Reid

unread,
Jun 10, 2016, 10:00:10 PM6/10/16
to e-l...@googlegroups.com
On Jun 10, 2016, at 10:55, Corbin Simpson <mostawe...@gmail.com> wrote:
>
> Hi everybody! I'm Corbin. I haven't posted here before. I'm working on Monte.

Hey! Thanks for writing txWS! </irrelevant>

> In Monte, we've explored efficient representations for our collections. The ConstMap has been a nemesis in this effort, because its list of requirements is very onerous:
>
> * Persistent/immutable/COW/structually-sharing. Whatever you want to call it. Y'all know what I'm talking about.
> * Iterable with a consistent and predictable iteration order based on insertion order. This is the hardest one, because it implies...
> * It's fine to require hashable (settled) keys, but not sorted keys. And you can't rely on hashing order.
> * Quick membership test/lookup. Amortized O(1) preferred, but I'll settle for O(log n).
>
> I'm currently waffling between two options. The first option is cutting-edge HAMTs. The problem with HAMTs is that the hashing has to be consistent between vats if you want replay to work across vats.

This sounds like you are assuming a too-weak iteration order property.
1. Replay necessitates deterministic behavior,
2. which results from fully-specified operations,
3. which includes that ordering is derived only from observable properties,
4. and non-Data† objects do not have observable hash codes,
so your hashing strategy cannot interfere with replay.

† Data = a more well-founded notion of DeepPassByCopy.

> The other option I've been thinking about is ordered dictionaries, as in Python. Monte's built on the RPython toolchain, and we have an RPython-level ordered dictionary for free. It has the insertion-order behavior that we want, and it's quite speedy. Unfortunately, it's not at all persistent, so we have to do lots and lots of copies. It *is* a great FlexMap structure, but not a great ConstMap structure.
>
> I've been reading a bunch but haven't found anything good, and the E docs don't indicate that a good solution was ever found. Has there been some research that I haven't yet found?

In E-on-CL, I implemented a special case for the common pattern of accumulating a ConstMap or ConstList which seemed to work fairly well without having the trickiness of persistent data structures. (Apologies if my E syntax is wrong; it's been a while.)

The pattern is that if x is a ConstMap then the method x.with(k, v) does not construct a ConstMap. Instead, it constructs a ref -- you could call it a thunk ref or a lazy ref -- which internally stores [x, k, v]. The same thing applies if you call with/2 again on its result. So you get a data structure shaped like a linked list: [[x, k1, v1], k2, v2].

The behavior of this thunk, then, is that if you do anything whatsoever on it _except_ calling with/2 again, it turns itself into a normal ref to a reified ConstMap, in a monolithic operation on the original map & _all_ added entries.

This ensures that with-accumulation is O(n) rather than O(n^2), but does nothing for any other pattern of use.

Here's links into the code; the relevant classes are with-node and lazy-ref.
https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/tables2.lisp#L590
https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/lazy.lisp

Corbin Simpson

unread,
Jun 11, 2016, 12:41:03 AM6/11/16
to e-lang
On Friday, June 10, 2016 at 7:00:10 PM UTC-7, Kevin Reid wrote:
On Jun 10, 2016, at 10:55, Corbin Simpson <mostawe...@gmail.com> wrote:
>
> Hi everybody! I'm Corbin. I haven't posted here before. I'm working on Monte.

Hey! Thanks for writing txWS! </irrelevant>

> In Monte, we've explored efficient representations for our collections. The ConstMap has been a nemesis in this effort, because its list of requirements is very onerous:
>
> * Persistent/immutable/COW/structually-sharing. Whatever you want to call it. Y'all know what I'm talking about.
> * Iterable with a consistent and predictable iteration order based on insertion order. This is the hardest one, because it implies...
> * It's fine to require hashable (settled) keys, but not sorted keys. And you can't rely on hashing order.
> * Quick membership test/lookup. Amortized O(1) preferred, but I'll settle for O(log n).
>
> I'm currently waffling between two options. The first option is cutting-edge HAMTs. The problem with HAMTs is that the hashing has to be consistent between vats if you want replay to work across vats.

This sounds like you are assuming a too-weak iteration order property.
1. Replay necessitates deterministic behavior,
2. which results from fully-specified operations,
3. which includes that ordering is derived only from observable properties,
4. and non-Data† objects do not have observable hash codes,
so your hashing strategy cannot interfere with replay.

† Data = a more well-founded notion of DeepPassByCopy.

Yeah, this is about what I reckoned as well. I know that we don't do *directly* observable hash codes, but if there were a published standard Monte hash, then people could just assume that "correct" vats will hash things in the standard way.

But it seems like there's a lot of times when implementations have had to change those hash functions, and I don't want to commit to a hash that'll surely have to change down the road. So yeah, this route seems like it just can't work.
 
> The other option I've been thinking about is ordered dictionaries, as in Python. Monte's built on the RPython toolchain, and we have an RPython-level ordered dictionary for free. It has the insertion-order behavior that we want, and it's quite speedy. Unfortunately, it's not at all persistent, so we have to do lots and lots of copies. It *is* a great FlexMap structure, but not a great ConstMap structure.
>
> I've been reading a bunch but haven't found anything good, and the E docs don't indicate that a good solution was ever found. Has there been some research that I haven't yet found?

In E-on-CL, I implemented a special case for the common pattern of accumulating a ConstMap or ConstList which seemed to work fairly well without having the trickiness of persistent data structures. (Apologies if my E syntax is wrong; it's been a while.)

The pattern is that if x is a ConstMap then the method x.with(k, v) does not construct a ConstMap. Instead, it constructs a ref -- you could call it a thunk ref or a lazy ref -- which internally stores [x, k, v]. The same thing applies if you call with/2 again on its result. So you get a data structure shaped like a linked list: [[x, k1, v1], k2, v2].

The behavior of this thunk, then, is that if you do anything whatsoever on it _except_ calling with/2 again, it turns itself into a normal ref to a reified ConstMap, in a monolithic operation on the original map & _all_ added entries.

This ensures that with-accumulation is O(n) rather than O(n^2), but does nothing for any other pattern of use.

Here's links into the code; the relevant classes are with-node and lazy-ref.
https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/tables2.lisp#L590
https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/lazy.lisp

Oh! I've seen this before! It's a kind of difference list. There's a common pattern in Haskell of using them as an append-efficient Monoid. Like you say, the applications are all O(1) and there's O(n) of them. The only catch is understanding that you have a buildup of thunks and you need to manage their evaluation carefully for good performance. Or something like that. I only skimmed the paper.

Kevin Reid

unread,
Jun 11, 2016, 4:13:17 AM6/11/16
to e-l...@googlegroups.com
On Jun 10, 2016, at 20:45, Corbin Simpson <mostawe...@gmail.com> wrote:
> On Friday, June 10, 2016 at 7:00:10 PM UTC-7, Kevin Reid wrote:
>> This sounds like you are assuming a too-weak iteration order property.
>> 1. Replay necessitates deterministic behavior,
>> 2. which results from fully-specified operations,
>> 3. which includes that ordering is derived only from observable properties,
>> 4. and non-Data† objects do not have observable hash codes,
>> so your hashing strategy cannot interfere with replay.
>>
>> † Data = a more well-founded notion of DeepPassByCopy.
>>
> Yeah, this is about what I reckoned as well. I know that we don't do *directly* observable hash codes, but if there were a published standard Monte hash,

This is impossible, unless you're being weaker than E on some dimension. Consider two E objects:
def prog = e`def _ {}`
def a := eval(prog)
def b := eval(prog)
These two objects were constructed exactly the same way. Therefore, they must be observably identical. If one of them is consistently sorted before the other in a table, _that would be an observable difference_ and is prohibited.

>> Here's links into the code; the relevant classes are with-node and lazy-ref.
>> https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/tables2.lisp#L590
>> https://github.com/kpreid/e-on-cl/blob/f93d188051c66db0ad4ff150bd73b838f7bc25ed/lisp/lazy.lisp
>
> Oh! I've seen this before! It's a kind of difference list. There's a common pattern in Haskell of using them as an append-efficient Monoid. Like you say, the applications are all O(1) and there's O(n) of them. The only catch is understanding that you have a buildup of thunks and you need to manage their evaluation carefully for good performance. Or something like that. I only skimmed the paper.

Yes, you have to manage them somehow *or* just declare that these are your expected performance characteristics.

A simple example of something that does poorly with these thunks but not otherwise would be:

var x := [k => 1]
while (true) {
# pretend this is an async loop or in some way actually useful
x := x.with(k => 2)
x := x.with(k => 1)
}

Without further special cases (and it could be complicated to defeat them), this will use unbounded memory rather than constant memory.

I just thought of a simple way to manage them that might be sufficient: Suppose that in each thunk x.with(k, v), there is also a number "limit", equal to (x.size() if x is a map or x.<limit> if x is a thunk) - 1. If this number would be negative, don't create a thunk and collapse instead. Thus, any given ConstMap-plus-thunks has at most twice as many entries as it “should”, and in the long run this is similar to the strategy of doubling a mutable array's size when it fills up.
Reply all
Reply to author
Forward
0 new messages