Unable to replace output sinkPath of job (/app/hadoop/tmp/cache/hduser/) with my own path

14 views

Skip to first unread message

santlal gupta

unread,

Jun 30, 2016, 4:14:02 AM6/30/16

to Lingual User

Hi,

I am using lingual JDBC for my POC. I found that when I run job on cluster it will write output of the query on /app/hadoop/tmp/cache/hduser location.

Below are parts that i found from log:

16/06/30 10:51:10 INFO provider.ProviderProxy: using null to create scheme for stereotype UNKNOWN with properties: {delimiter=,, extensions=.tcsv, typed=true, provider=text}

16/06/30 10:51:10 INFO provider.ProviderProxy: using null to create tap for Resource{identifier='/app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv', protocol=hdfs, format=tcsv, mode=REPLACE} with properties: {schemes=hdfs, provider=text}

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: reading default configuration from: hdfs://UbuntuD1:8020/user/hduser/.lingual/config/default.properties

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: user not supplied, using OS user

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using user: hduser

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using hadoop job jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: using app jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar

16/06/30 10:51:10 INFO hadoop2.Hadoop2MR1PlatformBroker: loading override properties from: jar:file:/home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar!/hadoop-override.properties

16/06/30 10:51:10 INFO Configuration.deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec

16/06/30 10:51:10 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

16/06/30 10:51:10 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

16/06/30 10:51:10 INFO planner.HadoopPlanner: using application jar: /home/hduser/Lingual/lib/lingual-hadoop2-mr1-1.2.1-jdbc.jar

16/06/30 10:51:10 INFO property.AppProps: using app.id: 4548F010FC4E40AA8F06C500A18B23FB

16/06/30 10:51:11 INFO Configuration.deprecation: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used

16/06/30 10:51:11 INFO Configuration.deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress

16/06/30 10:51:11 INFO util.Version: Concurrent, Inc - Cascading 2.7.0

16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...] starting

16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...] source: Hfs["TextDelimited[['f1', 'f2', 'f3', 'f4', 'f5' | String, DATE, DATE, Float, Boolean]]"]

["/user/hduser/debug/debug_job/input_small5field1GB/part1"]

16/06/30 10:51:11 INFO flow.Flow: [20160630-105110-1772B7...] sink: Hfs["SQLTypedTextDelimited[['f1', 'f2', 'f3', 'f4', 'f5' | String, DATE, DATE, Float, Boolean]]"]

["/app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv"]

16/06/30 10:52:30 INFO util.Hadoop18TapUtil: deleting temp path /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/_temporary

16/06/30 10:52:30 INFO mapred.FileInputFormat: Total input paths to process : 2

16/06/30 10:52:30 INFO Test.LingualTest: Resultset created :

I also found that it will delete /app/hadoop/tmp/cache/hduser/<jobid>/_temporary file, and write entire output on this folder.

hadoop fs -ls /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv

Found 3 items

-rw-r--r-- 3 hduser hadoop 0 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/_SUCCESS

-rw-r--r-- 3 hduser hadoop 4702992 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/part-00000

-rw-r--r-- 3 hduser hadoop 4685206 2016-06-30 13:02 /app/hadoop/tmp/cache/hduser/20160630-105110-1772B7E479.tcsv/part-00001

I have following question for this :

1. Is there any requirement of this output file? if not then what is reason to write output as we can get result from ResultSet.

2. Can i provide my own location to write the data instead of /app/hadoop/tmp/cache/hduser/ location ? So that after my use i can delete it.

I am using : .

lingual-hadoop2-mr1-1.2.1-jdbc.jar

for your reference i am attaching source code of the job and log file.

Thanks

Santlal

LingualTest.java

SchemaWithoutCommand.java

UbuntuD2.XYZ.net_35418

UbuntuD3.XYZ.net_49388

Reply all

Reply to author

Forward

0 new messages