Utilize the distributed cache

17 views
Skip to first unread message

Jeremy Highsmith

unread,
Nov 12, 2013, 7:09:16 PM11/12/13
to rha...@googlegroups.com
Hi, Does anyone have an example of using the distributed cache with rmr2? Is this possible using the latest package? Basically, I'm need to pull 1000s of key words from a large amount of text. I have a version which works in the Java MapReduce world and wanted to try rmr2 as an alternative. Thanks in advance for any info -- Jeremy

Antonio Piccolboni

unread,
Nov 13, 2013, 2:32:57 AM11/13/13
to rha...@googlegroups.com
rmr2 uses the distributed cache behind the scenes but doesn't give direct access to it. I am not sure I understand your problem, but if you can store your words in a vector, say keywords then you can use them directly in your map or reduce functions

keywords = ....
pattern = paste(keywords, collapse = "|")


mapreduce(input, map = function(k,v) grep(pattern, v))

or some such. Not sure grep scales well to such large pattern.

Antonio
Reply all
Reply to author
Forward
0 new messages