Reading File from Hdfs, applying filter and iterate data

22 views

Skip to first unread message

unread,

Apr 6, 2016, 11:49:22 AM4/6/16

to Lingual User

Hi,

We are exploring Cascading Linugal for following use case and it is best fit.

1.Reading avro data from HDFS, perform filter on it and return Handler to view the data.

For this i have explored cascading lingual with JDBC and Lingual in Java with Cascading. I have done below poc for it.

1. Lingual with JDBC:

For now i have tried with .txt file. Because we are still searching for data provider that provide format .avro.

And i found that while reading file from hdfs i have to do three step then only i can run the JDBC code.

a. To create schema lets say LOGS.

hduser@UbuntuD2:~$ lingual catalog --schema LOGS --add

b. Create a stereotype that describes the file structure say LOGFILE

hduser@UbuntuD2:~$ lingual catalog --schema LOGS --stereotype LOGFILE -add --columns id,name --types int,string

c. Now register table say LOGS in the schema called LOGS using the LOGFILE stereotype

hduser@UbuntuD2:~$ lingual catalog --schema LOGS --table LOGS --stereotype LOGFILE -add /home/hduser/myEmp/emp_data

then only i can write as follows:

Connection connection = DriverManager.getConnection("jdbc:lingual:hadoop2-mr1;catalog=/user/hduser/;schema=/home/hduser/myEmp");

this approach will give ResultSet as like Handler through which i can read data.

So can you suggest me how to reduce these steps. or any alternative for this?

2. Lingual in Java with Cascading:

In this case i think above steps are not required. i.e. create schema, stereotype and table.

But problem is that after cascading execution it will write the output in the file that is on HDFS, So we have to open the file and read it.

So it will take time. we can open the sinkTap and get the handler but before that data is already written to the file.

So can you suggest me how to improve time here ?

Thanks

Santlal Gupta

ClusteCascadingTest.java.txt

ClusterJDBCTest.java.txt

emp_data

Reply all

Reply to author

Forward

0 new messages