Performance issue in lingual compare to normal cascading filter

13 views

Skip to first unread message

Amiya Mishra

unread,

Jul 1, 2016, 10:57:29 AM7/1/16

to Lingual User

Hi,

I have tested a simple query using lingual shell and using cascading as follows.

Running Query on Lingual shell with input data size 1 GB having 72 columns

Step 1 : Preparing lingual catalog for executing query

Creating schema

lingual catalog --schema lingualquerycheck --add

creating stereotype with columns and applying types

lingual catalog --schema lingualquerycheck --stereotype floatSchemaStereo --add --columns f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72 --types string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double,string,date,date,double,boolean,int,float,long,double

creating table

lingual catalog --schema lingualquerycheck --table lingualqueryTable --stereotype floatSchemaStereo -add /user/hduser/debug/debug_job/input_test1gb

Step 2 : Executing query on above created table using lingual shell

Starting lingual shell

lingual shell

Executing query on lingual shell

0: jdbc:lingual:hadoop2-mr1> select * from "lingualquerycheck"."lingualqueryTable" where "f5" = true;

Running cascading job with filter on each pipe having same condition (f5 == true)

I have attached the source files named FilterCascading.java and FilterFunctionCascading.java on which i had run cascading job

procedure to run the cascading job using attached files

Step 1 : Create new java project. put the source files in src directory of project.

Step 2 : Export the project with given source files with required cascading jars.

Step 3 : Run the jar by using this command hadoop jar <jar name>

After completing above steps i got below observations on both site as :

Observations:

Lingual shell
Input data size	Condition	Number of Columns in input data	time taken
1 Gb	"f5" = true	72	9 min 71 sec
Cascading filter
Input data size	Condition	Number of Columns in input data	time taken
1 Gb	f5 == true	72	55 sec

Query:

As Lingual is acting top of cascading why it took 8 times more time to execute a simple query.Its a huge performance issue

Can someone help me , Why lingual shell is taking so much time as compare to filtering in cascading ?

Thanks

Amiya Mishra

FilterCascading.java

FilterFunctionCascading.java

Reply all

Reply to author

Forward

0 new messages