DistributedCache

Curt Holden

unread,

Dec 12, 2013, 4:10:41 PM12/12/13

to pangoo...@googlegroups.com

Is it possible to use the Hadoop DistributedCache with Pangool? It looks like I need access to the JobConf for this.

Alexei Perelighin

unread,

Dec 13, 2013, 3:35:33 AM12/13/13

to pangool-user

This is the way I did it.

use -files option to specify files to be placed in the distributed cache.

TupleMRContext context
Configuration conf = context.getHadoopContext().getConfiguration();
Path[] paths = DistributedCache.getLocalCacheFiles(conf);
for (Path path : paths) {
/*

PATH SELECTION LOGIC

*/
}

But, Pangool can work around it. As it uses distributed cache for Serialized objects. All you need to do is to read your file into a Serialized (like String or ArrayList<String> or HashSet<String , String>) and pass it as an argument to the constructor of the Reducer or Mapper class. Thus avoiding any hassle with the direct access to the DistributedCache.

Thanks,

Alexei

2013/12/12 Curt Holden <curt....@gmail.com>

Is it possible to use the Hadoop DistributedCache with Pangool? It looks like I need access to the JobConf for this.

--
Has recibido este mensaje porque estás suscrito al grupo "pangool-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a pangool-user...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.

Alexei Perelighin

unread,

Dec 13, 2013, 3:36:41 AM12/13/13

to pangoo...@googlegroups.com

Hi Curt,

This is the way I did it.

use -files option to specify files to be placed in the distributed cache.

Pere Ferrera

unread,

Dec 13, 2013, 3:54:00 AM12/13/13

to pangoo...@googlegroups.com, alex...@googlemail.com

Hello,

As Alexei said you can use the DistributedCache; if you want to configure it programatically you can use the Configuration object before launching the Job, and then do as Alexei suggests inside Mappers / Reducers (getHadoopContext() is the key to accessing all Hadoop-based functionality such as Counters).

For small files using state objects in Mappers / Reducers is more convenient, as Alexei pointed out.

Curt Holden

unread,

Dec 13, 2013, 4:28:58 PM12/13/13

to pangoo...@googlegroups.com, alex...@googlemail.com

As Alexi suggested, I switched my code to using a serialized data structure that I pass to the reducer. This works well since I am only working with a few 10s of MB of data.