how to specify num.chunks

78 views
Skip to first unread message

Sean McNamara

unread,
Aug 20, 2013, 12:22:56 PM8/20/13
to project-...@googlegroups.com
i am running into Chunk overflow exceptions.  I am passing hadoop-build-readonly-store.sh -D num.chunks=1000 ... to get around the 2GB chunk size limit.

However, when it runs it always says the num chunks is set to 1:

13/08/20 16:15:34 INFO mr.HadoopStoreBuilder: Number of chunks: 1, number of reducers: 50, save keys: false, reducerPerBucket: false


Am I not specifying it properly, or is it being overridden somewhere?   I am using voldemort 1.3.4

Thanks,

Sean

Chinmay Soman

unread,
Aug 21, 2013, 2:45:28 AM8/21/13
to project-...@googlegroups.com
Its calling this class HadoopStoreJobRunner

./contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreJobRunner.java

It suggest using         'chunksize'
parser.accepts("chunksize", "maximum size of a chunk in bytes.").withRequiredArg();

FYI: We dont use this class to do our builds. We use the VoldemortBuildAndPushJob

Sean McNamara

unread,
Aug 26, 2013, 2:46:45 PM8/26/13
to project-...@googlegroups.com
Chinmay-

Thank!  I see now, we set our chunk size to be *much* smaller than what we were using and it creates more num.chunks.  The downside is that it creates many more reducer tasks. --reducer-per-bucket should fixed that but it causes the JVM to OOM.  So I will continue to dive in.  We almost have this thing dialed in.

Sean
Reply all
Reply to author
Forward
0 new messages