Parallel processing of values that don't fit into memory

12 views
Skip to first unread message

Manuel Simoni

unread,
Jun 15, 2015, 2:43:49 PM6/15/15
to reactiv...@googlegroups.com
Does it make sense to treat an enormous FS directory as an RDP value?

I'd like to be able to do getBigData >>> parallelProcess, but I can't load the data into RAM.

I could imagine a signal implementation that allows downstream behaviors to read the "value" record-by-record, in parallel.

What are your thoughts on very large values and parallel processing in behaviors?

--Manuel

David Barbour

unread,
Jun 15, 2015, 3:02:19 PM6/15/15
to reactiv...@googlegroups.com
On Mon, Jun 15, 2015 at 1:43 PM, Manuel Simoni <msi...@gmail.com> wrote:
Does it make sense to treat an enormous FS directory as an RDP value?

I'd like to be able to do getBigData >>> parallelProcess, but I can't load the data into RAM.

I'd consider this an orthogonal issue to RDP. I.e. drop the "RDP" from your question: 

"Does it make sense to model an enormous FS directory as a value?" 

My own answer to this was: "sure!" A filesystem is, more or less, a trie, and an immutable trie value gives us nice copy-on-write properties. But to make this viable, I model large values above a memory-mapped key-value store, e.g. in VCache [1], such that I can easily work with persistent, filesystem-sized trie values [2]. I understand that similar models are also available in other languages such as Java.

 
Good support for enormous values and orthogonal persistence is very convenient, and enables pushing a lot of behavior into the purely functional computation layers.


What are your thoughts on very large values and parallel processing in behaviors?

Positive, in general. For Sirea, I made use of Haskell's `par` to push some parallelism into the functional layer. Though, Haskell didn't have anything like VCache at the time.

Reply all
Reply to author
Forward
0 new messages