Hi,
I have tested a simple query using lingual shell and using cascading as follows.
Running Query on Lingual shell with input data size 1 GB having 72 columns
Step 1 : Preparing lingual catalog for executing query
Creating schema
lingual catalog --schema lingualquerycheck --add
creating stereotype with columns and applying types
lingual catalog --schema lingualquerycheck --stereotype floatSchemaStereo --add --columns f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72 --types string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double
creating table
lingual catalog --schema lingualquerycheck --table lingualqueryTable --stereotype floatSchemaStereo -add /user/hduser/debug/debug_job/input_test1gb
Step 2 : Executing query on above created table using lingual shell
Starting lingual shell
lingual shell
Executing query on lingual shell
0: jdbc:lingual:hadoop2-mr1> select * from "lingualquerycheck"."lingualqueryTable" where "f5" = true;
Running cascading job with filter on each pipe having same condition (f5 == true)
I have attached the source files named FilterCascading.java and FilterFunctionCascading.java on which i had run cascading job
procedure to run the cascading job using attached files
Step 1 : Create new java project. put the source files in src directory of project.
Step 2 : Export the project with given source files with required cascading jars.
Step 3 : Run the jar by using this command hadoop jar <jar name>
After completing above steps i got below observations on both site as :
Observations:
Lingual
shell |
|
|
|
Input data size |
Condition |
Number of Columns in input data |
time taken |
1 Gb |
"f5" = true |
72 |
9 min 71 sec |
Cascading filter |
|
|
|
Input data size |
Condition |
Number of Columns in input data |
time taken |
1 Gb |
f5 == true |
72 |
55 sec |
Query:
As Lingual is acting top of cascading why it took 8 times more time to execute a simple query.Its a huge performance issue
Can someone help me , Why lingual shell is taking so much time as compare to filtering in cascading ?
Thanks
Amiya Mishra