Order by clause displayed sorted output without executing any reducer

36 views
Skip to first unread message

santlal gupta

unread,
May 4, 2016, 4:11:26 AM5/4/16
to Lingual User
Hi,

I am new to lingual, i ran below query on 1MB of input file:

select * from "logTest"."logTest" where "f1">0 order by "f1"; 

This query displayed result in sorted order, but when i saw log file, i got number of reduce task as 0 and also it only launched mapper. Below is the log from it:

2016-05-04 12:20:28,707 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-05-04 12:20:13,702 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1451399010344_2296 = 1287639. Number of splits = 2
2016-05-04 12:20:30,192 INFO [main] cascading.flow.hadoop.FlowMapper: sinking to: Hfs["SQLTypedTextDelimited[['f1', 'f2', 'f3' | int, String, double]]"]["hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv"]

I also found that the result of the query was saved at below location:

2016-05-04 12:21:20,225 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1451399010344_2296_m_000001_0' to hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv/_temporary/1/task_1451399010344_2296_m_000001


When I went to hdfs://UbuntuD1:8020/user/hduser/results/20160504-121959-D60ED4737F.tcsv location i found below 2 part files (i assume this is the result of two map tasks. please correct me if i am wrong) but the data in them was not in sorted order.

hduser@UbuntuD2:~$ hadoop fs -ls /user/hduser/results/20160504-121959-D60ED4737F.tcsv
Found 3 items
-rw-r--r--   3 hduser hadoop          0 2016-05-04 12:21 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/_SUCCESS
-rw-r--r--   3 hduser hadoop     640276 2016-05-04 12:20 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/part-00000
-rw-r--r--   3 hduser hadoop     640227 2016-05-04 12:21 /user/hduser/results/20160504-121959-D60ED4737F.tcsv/part-00001


So my question is, how lingual sorted the data without executing any reducer?
Why the result of the query displayed on the shell was sorted when the output of mapper is unsorted?

I have attached the log file for more information.

Thanks,
Santlal
application_1451399010344_2296.txt

Andre Kelpe

unread,
May 4, 2016, 6:51:52 AM5/4/16
to lingua...@googlegroups.com
Which version of lingual are you using?

- André
> --
> You received this message because you are subscribed to the Google Groups
> "Lingual User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lingual-user...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

santlal gupta

unread,
May 5, 2016, 2:59:15 AM5/5/16
to Lingual User
Hi Andre,

Thanks for your quick response.

I am using following version : 

lingual : 1.2.1
Cascading: 3.0.1
Hadoop : 2.6.0

Thanks
Santlal

Andre Kelpe

unread,
May 6, 2016, 6:15:56 AM5/6/16
to lingua...@googlegroups.com
That is odd. Would you mind sharing an example log file and the way
you configured your catalog with me?

Thanks!

- André

santlal gupta

unread,
May 9, 2016, 2:54:09 AM5/9/16
to Lingual User
Hi Andre,

Thanks for your response.

I have attached log files for your reference.

for configuring catalog, i followed steps  mentioned in lingual user guide (http://docs.cascading.org/lingual/1.2/).

Thanks
Santlal
UbuntuD3.myCluster_52529
UbuntuD4.myCluster_49849

santlal gupta

unread,
May 11, 2016, 2:58:23 AM5/11/16
to Lingual User
Hi Andre,

Did you get a chance to look into the log files. I am unable to proceed further with my poc. I am waiting for your response.

Thanks
Santlal
Reply all
Reply to author
Forward
0 new messages