File patterns with {} not working as input

55 views
Skip to first unread message

Curt Holden

unread,
Oct 3, 2013, 12:29:15 PM10/3/13
to pangoo...@googlegroups.com

When I pass the following file pattern:

/user/tholden/dedupe/2013/05/0[89]/part-*.avro

to my Pangool m/r job using TupleMRBuilder.addInput(Path, InputFormat, TupleMapper) the job works fine.

 
However, when I pass:

/user/tholden/dedupe/2013/05/{09,10}/part-*.avro

I get the following error.  I can use the latter pattern that uses {} with a pig script and it work fine.  Any ideas on what I am doing wrong?

 

13/10/03 12:19:52 INFO mapred.JobClient: Cleaning up the staging area hdfs://hadoop1.mitre.org:8020/tmp/hadoop/mapred/staging/tholden/.staging/job_201309130701_61526
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at com.datasalt.pangool.tuplemr.mapred.lib.input.PangoolMultipleInputs.getInputFormatMap(PangoolMultipleInputs.java:121)
        at com.datasalt.pangool.tuplemr.mapred.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:52)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:998)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1015)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:526)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:556)
        at org.mitre.ttv.CompareToolBase.run(CompareToolBase.java:108)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.mitre.ttv.CompareADSBtoTTTool.main(CompareADSBtoTTTool.java:25)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Pere Ferrera

unread,
Oct 4, 2013, 5:17:30 AM10/4/13
to pangoo...@googlegroups.com
Thanks for reporting, Curt. I will take a look and contact you back briefly.

Pere Ferrera

unread,
Oct 4, 2013, 11:29:52 AM10/4/13
to pangoo...@googlegroups.com
Hello Curt,

It seems the MultipleInputs implementation which was mimicking the way Hadoop added the input info into the Configuration didn't handle special characters well in paths like the one you reported.

However it was quite easy to solve and there's already a snapshot with the problem solved which you can use if you want:

 <dependency>
     <groupId>com.datasalt.pangool</groupId>
     <artifactId>pangool-core</artifactId>
     <version>0.60.7-SNAPSHOT</version>
 </dependency>
The snapshots are published to our repository (see http://pangool.net/build.html) which you'll need to add in Maven <repositories>.

Please shout if there's any issue.

Thanks,

Curt Holden

unread,
Oct 4, 2013, 12:05:36 PM10/4/13
to pangoo...@googlegroups.com
Pere,
 
Thanks for the quick response.  I am using pangool version 0.60.3.  I will check out this snapshot and see if it fixes the issue.
 
Curt-

Curt Holden

unread,
Oct 8, 2013, 2:29:30 PM10/8/13
to pangoo...@googlegroups.com
Pere,
 
Testing suggests that the problem is addressed in 0.60.7-SNAPSHOT.
 
Thanks for your help,
 
Curt-

Pere Ferrera

unread,
Oct 11, 2013, 6:17:03 AM10/11/13
to pangoo...@googlegroups.com
Great, thanks for reporting!



--
Has recibido este mensaje porque estás suscrito al grupo "pangool-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a pangool-user...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages