What array-like data structure to use inside acid-state?

42 views
Skip to first unread message

Maxim Savchenko

unread,
Mar 26, 2013, 4:05:36 PM3/26/13
to ha...@googlegroups.com
Hello!

And another noob question from me. I need to store integer-indexed structure with fast, preferably constant-time r/w access in the acid-state. My algorithm ideally lays on mutable arrays, but as far as I see, acid-state can't contain mutable structures. Should I move to IntMap, or there is some way to make acid-state store log of ST-monad writes?

dag.od...@gmail.com

unread,
Mar 26, 2013, 7:54:05 PM3/26/13
to ha...@googlegroups.com
It should be possible to use runST in acid-state transactions since it's pure.  I'm not sure if it'll be better than a pure data structure like IntMap or Vector though, wouldn't each ST computation need to make a new copy to work on?

As for pure options, this is my understanding of the trade-offs:

* IntMap gives you fast lookup and modification but "length" is expensive
* Map Int has constant time "length" but is slower in other operations than IntMap
* Vector gives constant time lookup and length but linear time modification

So assuming you're not counting the elements a lot but need fast read and write I'd recommend IntMap for you.  Also you should default to using the strict functions; acid-state itself is strict although you can use finite laziness to operate on data if you need to.


On Tue, Mar 26, 2013 at 9:05 PM, Maxim Savchenko <pdu...@gmail.com> wrote:
Hello!

And another noob question from me. I need to store integer-indexed structure with fast, preferably constant-time r/w access in the acid-state. My algorithm ideally lays on mutable arrays, but as far as I see, acid-state can't contain mutable structures. Should I move to IntMap, or there is some way to make acid-state store log of ST-monad writes?

--
You received this message because you are subscribed to the Google Groups "HAppS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to happs+un...@googlegroups.com.
To post to this group, send email to ha...@googlegroups.com.
Visit this group at http://groups.google.com/group/happs?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Maxim Savchenko

unread,
Mar 27, 2013, 5:04:24 AM3/27/13
to ha...@googlegroups.com, dag.od...@gmail.com
I have no need to use length at all - lower and upper bounds of solid index range are constants in my case. But "fast" lookup and modification of IntMap are still about 15 times slower then STVector for my volumes of data. With small changes this difference do not matters, but sometimes I need heavy operations involving multiple passes over whole dataset. In this cases I have found faster to convert IntMap to MVector, runST over it, and convert back. It works, of course, but looks pretty weird - that's why I'm asking if I can get rid of this weirdness somehow and just use STVector directly.

среда, 27 марта 2013 г., 3:54:05 UTC+4 пользователь Dag Odenhall написал:

David Himmelstrup

unread,
Mar 27, 2013, 5:09:21 AM3/27/13
to ha...@googlegroups.com, dag.od...@gmail.com
The guarantees that AcidState provides are incompatible with mutable arrays. You could use an MVar or an IORef instead, though.

-- 
David Himmelstrup

Maxim Savchenko

unread,
Mar 27, 2013, 5:26:33 AM3/27/13
to ha...@googlegroups.com, dag.od...@gmail.com
Thank you! I'll be using IntMap then.

среда, 27 марта 2013 г., 13:09:21 UTC+4 пользователь Lemmih написал:

dag.od...@gmail.com

unread,
Mar 27, 2013, 2:28:51 PM3/27/13
to Maxim Savchenko, ha...@googlegroups.com
Well in that case I think you should be able to runST inside an Update transaction.  If you do batch operations perhaps the overhead of cloning is smaller than the overhead of immutability.  No need to convert between IntMap and MVector, I can't imagine.

Maxim Savchenko

unread,
Mar 27, 2013, 2:59:46 PM3/27/13
to ha...@googlegroups.com, Maxim Savchenko, dag.od...@gmail.com
Unfortunately, batch operations are relatively rare in my case - most of updates are single-index. Haven't benchmarked this yet, but doing cloning looks like big multiplier for rewriting one item. That's why conversion appears - in usual operation mode content of acid-state should handle random single writes well enough, so IntMap is preferable. But occasionaly big batch incomes - and for it difference between MVector and IntMap is like couple-seconds vs half-a-minute of waiting.

dag.od...@gmail.com

unread,
Mar 27, 2013, 4:26:45 PM3/27/13
to Maxim Savchenko, ha...@googlegroups.com
I was thining of:

> In this cases I have found faster to convert IntMap to MVector, runST over it, and convert back.

If that's true, it should be faster still to runST and skip the IntMap all together.  But if you have mixed needs where IntMap is usually better, perhaps converting to MVector for the occasional batch operations and storing it as an IntMap in acid-state is in deed your best option.

Jake McArthur

unread,
Mar 28, 2013, 8:19:56 AM3/28/13
to ha...@googlegroups.com, dag.od...@gmail.com, Maxim Savchenko

Keep in mind that you are hitting the filesystem every time you perform a transaction. I suspect the performance difference between IntMap and a mutable array would be pretty small.

Maxim Savchenko

unread,
Mar 28, 2013, 8:35:56 AM3/28/13
to ha...@googlegroups.com, dag.od...@gmail.com, Maxim Savchenko
Measured alredy - for single writes filesystem do eats all difference, that's right, but for batches compuations are heavier then storage io.

Jake McArthur

unread,
Mar 28, 2013, 2:20:07 PM3/28/13
to ha...@googlegroups.com, dag.od...@gmail.com, Maxim Savchenko

Could you not get the best of both worlds by using ST when doing a batch update and plain IntMap when doing a small update?

Reply all
Reply to author
Forward
0 new messages