to.dfs
, not in a scalable way.". What does this mean ? If I want to write for example 4 TB of data then it is not efficient?
2. The default input when to.dfs is used is with null key and the value contains the data. This way only one map function is called. If I have large amount of data (e.g. 4TB) then what will happen ? Only one map function will be called ?
3. If I want to change the default key,value pair and define my own key but the value is the same, how can I do this efficiently ? For example, if I have a dataset and the key defines different columns of the dataset and the value is the whole dataset, how can I do it efficiently ?
4. If I want to change the input format to the mapreduce then the whole dataset would be processed ? E.g. if I use make.input.format.
These are some questions that bothering me for some time now.
If someone knows please answer me.
Thank you in advanced
Hello,
I would like to ask a few questions that are related on the input of rmr2. My questions are:
1. in the tutorial ( https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md ) it is mentioned that "It is not possible to write out big data withto.dfs
, not in a scalable way.". What does this mean ? If I want to write for example 4 TB of data then it is not efficient?
2. The default input when to.dfs is used is with null key and the value contains the data. This way only one map function is called.
If I have large amount of data (e.g. 4TB) then what will happen ? Only one map function will be called ?
3. If I want to change the default key,value pair and define my own key but the value is the same, how can I do this efficiently ?
For example, if I have a dataset and the key defines different columns of the dataset and the value is the whole dataset, how can I do it efficiently ?
4. If I want to change the input format to the mapreduce then the whole dataset would be processed ?
E.g. if I use make.input.format.
These are some questions that bothering me for some time now.
If someone knows please answer me.
--
Thank you in advanced
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.