Parallel processing of values that don't fit into memory
12 views
Skip to first unread message
Manuel Simoni
unread,
Jun 15, 2015, 2:43:49 PM6/15/15
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to reactiv...@googlegroups.com
Does it make sense to treat an enormous FS directory as an RDP value?
I'd like to be able to do getBigData >>> parallelProcess, but I can't load the data into RAM.
I could imagine a signal implementation that allows downstream behaviors to read the "value" record-by-record, in parallel.
What are your thoughts on very large values and parallel processing in behaviors?
--Manuel
David Barbour
unread,
Jun 15, 2015, 3:02:19 PM6/15/15
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to reactiv...@googlegroups.com
On Mon, Jun 15, 2015 at 1:43 PM, Manuel Simoni <msi...@gmail.com> wrote:
Does it make sense to treat an enormous FS directory as an RDP value?
I'd like to be able to do getBigData >>> parallelProcess, but I can't load the data into RAM.
I'd consider this an orthogonal issue to RDP. I.e. drop the "RDP" from your question:
"Does it make sense to model an enormous FS directory as a value?"
My own answer to this was: "sure!" A filesystem is, more or less, a trie, and an immutable trie value gives us nice copy-on-write properties. But to make this viable, I model large values above a memory-mapped key-value store, e.g. in VCache [1], such that I can easily work with persistent, filesystem-sized trie values [2]. I understand that similar models are also available in other languages such as Java.
Good support for enormous values and orthogonal persistence is very convenient, and enables pushing a lot of behavior into the purely functional computation layers.
What are your thoughts on very large values and parallel processing in behaviors?
Positive, in general. For Sirea, I made use of Haskell's `par` to push some parallelism into the functional layer. Though, Haskell didn't have anything like VCache at the time.