Unable to replace output sinkPath of job (/app/hadoop/tmp/cache/hduser/) with my own path

14 views
Skip to first unread message

santlal gupta

unread,
Jun 30, 2016, 4:14:02 AM6/30/16
to Lingual User
Hi,

I am using lingual JDBC for my POC. I found that when I run job on cluster it will write output of the query on /app/hadoop/tmp/cache/hduser  location. 

Below are parts that i found from log:

          16/06/30 10:51:10 INFO provider.ProviderProxy: using null to create scheme for stereotype UNKNOWN with properties: {delimiter=,, extensions=.tcsv, typed=true, provider=text}
          16/06/30 10:51:10 INFO provider.ProviderProxy: using null to create tap for Resource{identifier='/app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv', protocol=hdfs, format=tcsv,                mode=REPLACE} with properties: {schemes=hdfs, provider=text}
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: reading default configuration from: hdfs://UbuntuD1:8020/user/hduser/.lingual/config/default.properties
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: user not supplied, using OS user
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using user: hduser
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using hadoop job jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using app jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar
           16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: loading override properties from: jar:file:/home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar!/hadoop-override.properties
           16/06/30 10:51:10 INFO Configuration.deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
           16/06/30 10:51:10 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
           16/06/30 10:51:10 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
           16/06/30 10:51:10 INFO planner.HadoopPlanner: using application jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar
           16/06/30 10:51:10 INFO property.AppProps: using app.id: 4548F010FC4E40AA8F06C500A18B23FB
           16/06/30 10:51:11 INFO Configuration.deprecation: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
           16/06/30 10:51:11 INFO Configuration.deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
           16/06/30 10:51:11 INFO util.Version: Concurrent, Inc - Cascading 2.7.0
           16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...] starting
           16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...]  source: Hfs["TextDelimited[['f1', 'f2', 'f3', 'f4', 'f5' | String, DATE, DATE, Float, Boolean]]"]                           
            ["/user/hduser/debug/debug_job/input_small5field1GB/part1"]
            16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...]  sink: Hfs["SQLTypedTextDelimited[['f1', 'f2', 'f3', 'f4', 'f5' | String, DATE, DATE, Float, Boolean]]"]          
            ["/app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv"]
            .
            .
            .
            .
            .
           16/06/30 10:52:30 INFO util.Hadoop18TapUtil: deleting temp path /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/_temporary
           16/06/30 10:52:30 INFO mapred.FileInputFormat: Total input paths to process : 2
           16/06/30 10:52:30 INFO Test.LingualTest: Resultset created :

I also found that it will delete /app/hadoop/tmp/cache/hduser/<jobid>/_temporary file, and write entire output on this folder.

          hadoop fs -ls /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv

          Found 3 items
          -rw-r--r--   3 hduser hadoop          0 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/_SUCCESS
          -rw-r--r--   3 hduser hadoop    4702992 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/part-00000
          -rw-r--r--   3 hduser hadoop    4685206 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/part-00001


I have following question for this :

1.  Is there any requirement of this output file? if not then what is reason to write output as we can get result from ResultSet.

2. Can i provide my own location to write the data instead of /app/hadoop/tmp/cache/hduser/   location ?  So that after my use i can delete it.

I am using : .

lingual-hadoop2-mr1-1.2.1-jdbc.jar

for your reference i am attaching source code of the job and log file.

Thanks
Santlal 


LingualTest.java
SchemaWithoutCommand.java
UbuntuD2.XYZ.net_35418
UbuntuD3.XYZ.net_49388
Reply all
Reply to author
Forward
0 new messages