Chunk overflow exception

136 visualizações
Pular para a primeira mensagem não lida

Avani

não lida,
12 de nov. de 2010, 20:15:4512/11/2010
para project-voldemort
I lately got this error when trying to build a store for 40G of data.

voldemort.VoldemortException: Chunk overflow exception: chunk 0 has
exceeded 2147483647 bytes.
at
voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
100)
at
voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
46)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

I had 8 Mappers and only 3 reducers for my 3 -node cluster. I looked
at the code and understand that I am going above 2G
(Integer.MAX_VALUE) here. What should I do to avoid this error? Should
the reducer not handle big data on its own dumping temp files to
disk ? Currently, I am retrying the job with more reducers, but would
prefer that it worked w/o this workaround(if that works at all).

I was previously able to load this data on another voldemort cluster
with just 1 mapper and 3 reducers. Could you please throw more light
on what is happening here?

Avani

não lida,
23 de nov. de 2010, 18:40:4623/11/2010
para project-voldemort
Any one has seen this? I am running into this and am unable to fathom
it.

Jay Kreps

não lida,
1 de dez. de 2010, 23:02:1501/12/2010
para project-...@googlegroups.com
Hi,

What is the chunk size you are setting? The stores are made up of
chunks of no more than 2gb each, but the number of chunks can be
arbitrarily large.

-Jay

> --
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>

Avani

não lida,
9 de dez. de 2010, 18:44:2709/12/2010
para project-voldemort
1073741824 bytes = 1GB . I am still running into this error- unable to
build a 4GB voldemort index. Any help is appreciated. Can you explain
what does this chunksize define?

On Dec 1, 8:02 pm, Jay Kreps <jay.kr...@gmail.com> wrote:
> Hi,
>
> What is thechunksize you are setting? The stores are made up of
> chunks of no more than 2gb each, but the number of chunks can be
> arbitrarily large.
>
> -Jay
>
> On Tue, Nov 23, 2010 at 3:40 PM, Avani <avanisha...@gmail.com> wrote:
> > Any one has seen this? I am running into this and am unable to fathom
> > it.
>
> > On Nov 12, 5:15 pm, Avani <avanisha...@gmail.com> wrote:
> >> I lately got this error when trying to build a store for 40G of data.
>
> >> voldemort.VoldemortException:Chunkoverflowexception:chunk0 has
> >> exceeded 2147483647 bytes.
> >>         at
> >> voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
> >> 100)
> >>         at
> >> voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
> >> 46)
> >>         at
> >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
> >> 411)
> >>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> >> I had 8 Mappers and only 3 reducers for my 3 -node cluster. I looked
> >> at the code and understand that I am going above 2G
> >> (Integer.MAX_VALUE) here. What should I do to avoid this error? Should
> >> the reducer not handle big data on its own dumping temp files to
> >> disk ?  Currently, I am retrying the job with more reducers, but would
> >> prefer that it worked w/o this workaround(if that works at all).
>
> >> I was previously able to load this data on another voldemort cluster
> >> with just 1 mapper and 3 reducers. Could you please throw more light
> >> on what is happening here?
>
> > --
> > You received this message because you are subscribed to theGoogleGroups "project-voldemort" group.

Avani

não lida,
13 de dez. de 2010, 15:56:0913/12/2010
para project-voldemort
I re-ran with 5G data today and chunksize of "--chunksize 2040109465
". I get the same error still. How can I restrict it from going over
max integer limit ?


10/12/13 12:43:52 INFO mapred.JobClient: map 100% reduce 76%
10/12/13 12:43:55 INFO mapred.JobClient: map 100% reduce 78%
10/12/13 12:43:59 INFO mapred.JobClient: Task Id :
attempt_201012090910_0009_r_000000_0, Status : FAILED
voldemort.VoldemortException: Chunk overflow exception: chunk 0 has
exceeded 2147483647 bytes.
at
voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
103)
at
voldemort.store.readonly.mr.HadoopStoreBuilderReducer.reduce(HadoopStoreBuilderReducer.java:
46)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


Avani

não lida,
13 de dez. de 2010, 21:12:0713/12/2010
para project-voldemort
I figured what the possible problem is :

I am giving a list of files as input to get more than one map (my
input is 5G of binary, hence not splittable). So, I create say 10
binary files in HDFS and give those as input using reg-ex. With this,
I notice that the data size reported is 0 and hence the number of
chunks is 1 (even though total data is >2G).

This is what the output looks like :
options : --chunksize 2040109465 --inputformat VoldLkpInputFormat --
input file.dat.[0-9]*

10/12/13 17:12:06 INFO mr.HadoopStoreBuilder: Data size = 0,
replication factor = 2, numNodes = 3, chunk size = 2040109465,
num.chunks = 1
10/12/13 17:12:06 INFO mr.HadoopStoreBuilder: Number of reduces: 3
...

If I use only 1 split then I get the data size:
10/12/13 17:24:23 INFO mr.HadoopStoreBuilder: Data size = 512739299,
replication factor = 2, numNodes = 3, chunk size = 2040109465,
num.chunks = 1


The related code in contrib/hadoop-store-builder/src/java/voldemort/
store/readonly/mr/HadoopStoreBuilder.java is

176 // delete output dir if it already exists
177 FileSystem tempFs = tempDir.getFileSystem(conf);
178 tempFs.delete(tempDir, true);
179
180 long size = sizeOfPath(tempFs, inputPath);
181 int numChunks = Math.max((int)
(storeDef.getReplicationFactor() * size
182 /
cluster.getNumberOfNodes() / chunkSizeBytes), 1);
183 logger.info("Data size = " + size + ", replication
factor = "
184 + storeDef.getReplicationFactor() + ",
numNodes = "
185 + cluster.getNumberOfNodes() + ",
chunk size = " + chunkSizeBytes
186 + ", num.chunks = " + numChunks);
187 conf.setInt("num.chunks", numChunks);



Further, when I give the whole file as one big binary file that is 5G,
I run into "Error: Java heap space".
10/12/13 18:05:27 INFO mapred.JobClient: Failed map tasks=1
voldemort.VoldemortException: java.io.IOException: Job failed!
at
voldemort.store.readonly.mr.HadoopStoreBuilder.build(HadoopStoreBuilder.java:
242)
at
voldemort.store.readonly.mr.HadoopStoreJobRunner.run(HadoopStoreJobRunner.java:
180)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
voldemort.store.readonly.mr.HadoopStoreJobRunner.main(HadoopStoreJobRunner.java:
257)
Caused by: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
1293)
at
voldemort.store.readonly.mr.HadoopStoreBuilder.build(HadoopStoreBuilder.java:
192)
... 3 more


Any workaround ?

Avani

não lida,
15 de dez. de 2010, 21:53:0215/12/2010
para project-voldemort
1. Here is the fix to handle regEx:
In : contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/
HadoopStoreBuilder.java

private long sizeOfPath(FileSystem fs, Path path) throws
IOException {
long size = 0;
FileStatus[] statuses = fs.listStatus(path);

+ // check if a regex was sent - avani
+ if (statuses == null) {
+ statuses = fs.globStatus(path);
+ }
if(statuses != null) {
for(FileStatus status: statuses) {
if(status.isDir())
size += sizeOfPath(fs, status.getPath());
else
size += status.getLen();
}
}
return size;
}


2. The Java heap space error mentioned earlier was a problem in my
custom mapper code.

Problem resolved.
Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem