Performance issue in lingual compare to normal cascading filter

13 views
Skip to first unread message

Amiya Mishra

unread,
Jul 1, 2016, 10:57:29 AM7/1/16
to Lingual User
Hi,

I have tested a simple query using lingual shell and using cascading as follows.

Running Query on Lingual shell with input data size 1 GB having 72 columns

Step 1 : Preparing lingual catalog for executing query

Creating schema
lingual catalog --schema lingualquerycheck --add

creating stereotype with columns and applying types
lingual catalog --schema lingualquerycheck --stereotype floatSchemaStereo --add --columns f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72 --types string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double

creating table
lingual catalog --schema lingualquerycheck --table lingualqueryTable --stereotype floatSchemaStereo -add /user/hduser/debug/debug_job/input_test1gb

Step 2 : Executing query on above created table using lingual shell

Starting lingual shell
lingual shell

Executing query on lingual shell
0: jdbc:lingual:hadoop2-mr1> select * from "lingualquerycheck"."lingualqueryTable" where "f5" = true;

Running cascading job with filter on each pipe having same condition (f5 == true)

I have attached the source files named FilterCascading.java and FilterFunctionCascading.java on which i had run cascading job 

procedure to run the cascading job using attached files

Step 1 : Create new java project. put the source files in src directory of project.

Step 2 : Export the project with given source files with required cascading jars.

Step 3 : Run the jar by using this command  hadoop jar <jar name>


After completing above steps i got below observations on both site as : 

Observations:

Lingual shell
Input data size Condition Number of Columns in input data time taken
1 Gb "f5" = true 72 9 min 71 sec
Cascading filter
Input data size Condition  Number of Columns in input data time taken
1 Gb f5 == true 72 55 sec


Query:

As Lingual is acting top of cascading why it took 8 times more time to execute a simple query.Its a huge performance issue

Can someone help me , Why lingual shell is taking so much time as compare to filtering in cascading ? 

Thanks
Amiya Mishra  




FilterCascading.java
FilterFunctionCascading.java
Reply all
Reply to author
Forward
0 new messages