Hi,
I am new to lingual, i ran below query on 1MB of input file:
select * from "logTest"."logTest" where "f1">0 order by "f1";
This query displayed result in sorted order, but when i saw log file, i got number of reduce task as 0 and also it only launched mapper. Below is the log from it:
2016-05-04 12:20:28,707 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-05-04 12:20:13,702 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1451399010344_2296 = 1287639. Number of splits = 2
2016-05-04 12:20:30,192 INFO [main] cascading.flow.hadoop.FlowMapper: sinking to: Hfs["SQLTypedTextDelimited[['f1', 'f2', 'f3' | int, String, double]]"]["hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv"]
I also found that the result of the query was saved at below location:
2016-05-04 12:21:20,225 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1451399010344_2296_m_000001_0' to hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv/_temporary/1/task_1451399010344_2296_m_000001
When I went to hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv location i found below 2 part files (i assume this is the result of two map tasks. please correct me if i am wrong) but the data in them was not in sorted order.
hduser@UbuntuD2:~$ hadoop fs -ls /user/hduser/results/20160504-121959-D60ED4737F.tcsv
Found 3 items
-rw-r--r-- 3 hduser hadoop 0 2016-05-04 12:21 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/_SUCCESS
-rw-r--r-- 3 hduser hadoop 640276 2016-05-04 12:20 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/part-00000
-rw-r--r-- 3 hduser hadoop 640227 2016-05-04 12:21 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/part-00001
So my question is, how lingual sorted the data without executing any reducer?
Why the result of the query displayed on the shell was sorted when the output of mapper is unsorted?
I have attached the log file for more information.
Thanks,
Santlal