ERROR: com.twitter.elephantbird.mapreduce.input.RawSequenceFileInputFormat

52 views
Skip to first unread message

samee...@gmail.com

unread,
Nov 13, 2013, 3:04:34 PM11/13/13
to elephant...@googlegroups.com

Hi I am embedding my Pig script inside python

# I have registered appropriate jars and I use E-B to load the sequence file and then process it and store the result into o/p file.

params= [{'infile':  '/scratch/myinfile.seq', 'outfile': '/scratch/result.seq0', 'id': 'AAAAA' },
        {'infile':  '/scratch/myinfile.seq', 'outfile': '/scratch/result.seq1', 'id': 'BBBBB' },
        {'infile':  '/scratch/myinfile.seq', 'outfile': '/scratch/result.seq2', 'id': 'CCCCC' },
        {'infile':  '/scratch/myinfile.seq', 'outfile': '/scratch/result.seq3', 'id': 'DDDDD' },
        ]

Pig.define("SEQFILE_LOADER","com.twitter.elephantbird.pig.load.SequenceFileLoader");
Pig.define("TEXT_CONVERTER", "com.twitter.elephantbird.pig.util.TextConverter");


P = Pig.compile("A = LOAD '$infile' USING SEQFILE_LOADER ( '-c TEXT_CONVERTER', '-c TEXT_CONVERTER') AS (key: chararray, value: chararray); AU = FOREACH A GENERATE FLATTEN(myparser.myUDF(key, value)); STORE AU into '$outfile';");

The following sequential version works great!

Sequential:
for i in range(len(conceptidarr)):
    bound = P.bind(params);
    result = bound.runSingle()

Parallel:

   In this case I also use the following: Pig.set("default_parallel","25");

for i in range(len(conceptidarr)):

    bound = P.bind(params);
    result = bound.run()


Unfortunately, in the parallel version, I get the following error: "2013-11-13 11:29:30,943 [pool-4-thread-3] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: com.twitter.elephantbird.mapreduce.input.RawSequenceFileInputFormat"

Please see the log snippet below. Any help with this would be great!



2013-11-13 11:28:30,320 [pool-4-thread-1] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-11-13 11:28:30,320 [pool-4-thread-1] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-11-13 11:28:30,320 [pool-4-thread-1] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-11-13 11:28:30,326 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-11-13 11:28:30,462 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-11-13 11:28:30,463 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-11-13 11:28:30,477 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-11-13 11:28:30,478 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-11-13 11:28:30,480 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-11-13 11:28:30,481 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-11-13 11:28:30,491 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-11-13 11:28:30,493 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-11-13 11:28:30,755 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201311111627_0062
2013-11-13 11:28:30,755 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,AU
2013-11-13 11:28:30,755 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],AU[-1,-1] C:  R:
2013-11-13 11:28:30,755 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://pzxnvm2018.dcld.pldc.kp.org:50030/jobdetails.jsp?jobid=job_201311111627_0062
2013-11-13 11:28:30,758 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-11-13 11:28:30,802 [pool-4-thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201311111627_0063
2013-11-13 11:28:30,802 [pool-4-thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201311111627_0064
2013-11-13 11:28:30,802 [pool-4-thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,AU
2013-11-13 11:28:30,802 [pool-4-thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],AU[-1,-1] C:  R:
2013-11-13 11:28:30,803 [pool-4-thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://pzxnvm2018.dcld.pldc.kp.org:50030/jobdetails.jsp?jobid=job_201311111627_0064
2013-11-13 11:28:30,802 [pool-4-thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,AU
2013-11-13 11:28:30,803 [pool-4-thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],AU[-1,-1] C:  R:
2013-11-13 11:28:30,803 [pool-4-thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://<ip_address>:50030/jobdetails.jsp?jobid=job_201311111627_0063
2013-11-13 11:28:30,805 [pool-4-thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-11-13 11:28:30,806 [pool-4-thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-11-13 11:28:30,826 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201311111627_0065
2013-11-13 11:28:30,826 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,AU
2013-11-13 11:28:30,826 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],AU[-1,-1] C:  R:
2013-11-13 11:28:30,827 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://<ip_address>:50030/jobdetails.jsp?jobid=job_201311111627_0065
2013-11-13 11:28:30,829 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-11-13 11:29:30,939 [pool-4-thread-3] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-11-13 11:29:30,939 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201311111627_0062 has failed! Stop running all dependent jobs
2013-11-13 11:29:30,939 [pool-4-thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-11-13 11:29:30,943 [pool-4-thread-3] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: com.twitter.elephantbird.mapreduce.input.RawSequenceFileInputFormat
2013-11-13 11:29:30,943 [pool-4-thread-3] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-11-13 11:29:30,943 [pool-4-thread-3] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt    Features
1.0.3    0.11.1    p529444    2013-11-13 11:28:26    2013-11-13 11:29:30    UNKNOWN

Failed!

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201311111627_0062    A,AU    MAP_ONLY    Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201311111627_0062_m_000000    /scratch/result.seq2,


Dmitriy Ryaboy

unread,
Nov 13, 2013, 3:13:40 PM11/13/13
to elephant...@googlegroups.com
Have you looked at the failed tasks of this failing job, to determine
the root cause?

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201311111627_0062 has failed! Stop running all dependent
jobs

On Nov 13, 2013, at 12:04 PM, "samee...@gmail.com"
Reply all
Reply to author
Forward
0 new messages