unable to launch kafka-to-hdfs-sync app in datatorrent

39 views
Skip to first unread message

akshay naidu

unread,
Dec 28, 2016, 9:46:38 AM12/28/16
to DataTorrent Users Group
Hello, 
     I am newbie in Datatorrent,
     I am trying to launch kafka-to-hdfs-sync App in Datatorrent. I have a .csv file in hdfs, i want to give this file as input to this app. 
     I maybe wrong but my 1st guess is to add path to .csv file in one of the dt.operator.kafkaInput....properties.. under "Specify custom properties" . But when I add path to csv file, which is '/user/dtadmin/data/xyz.csv' , the application launches successfully but when I open the application link given in the notification it shows the status as failed in application overview page.
      I have attached a screenshot with the configs that i have saved.
     kindly assist me. Its urgent. 
        Thank you 
Screenshot from 2016-12-28 20:10:22.png

chai...@datatorrent.com

unread,
Dec 28, 2016, 11:59:47 AM12/28/16
to DataTorrent Users Group
Hi Akshay,

   App which you have launched is for consume messages from Kafka and write to HDFS.
   I think you are looking read data from HDFS and write to Kafka. Please correct it, if I am wrong.
   If you are looking this use case then launch hdfs-to-kafka-sync App.

Regards,
Chaitanya

akshay naidu

unread,
Dec 28, 2016, 1:34:54 PM12/28/16
to DataTorrent Users Group
Hi Chaitanya,
Thanks for reply.
What I am trying to do is to create a simulated Kafka streaming input out of a '.csv' data file and give it as input to Kafka-to-HDFS-sync app. Actually I want to create an app to upload in apphub similar to Kafka-to-HDFS-Sync App which will take .csv as input to the app.
Please suggest me how to approach.
Thank You,

dee...@datatorrent.com

unread,
Dec 29, 2016, 12:29:05 AM12/29/16
to DataTorrent Users Group
Hi Akshay,

Are you trying to read records from .CSV file and process it further and write it to HDFS. Let us know if we understand the use case correctly.
For this use case we already have HDFS Sync App (https://www.datatorrent.com/apphub/hdfs-to-hdfs-line-copy/) which is basically line by line record reader. You can provide CSV file as an input.

Thanks,
Deepak

akshay naidu

unread,
Dec 29, 2016, 12:43:51 AM12/29/16
to DataTorrent Users Group
Hi Deepak, 

What I want is,  using the sample data file (.csv) create a simulated Kafka streaming input and Run the Kafka-to-hdfs in apphud. 

now this is what i have tried till now, i have a sample data file (.csv) in my local folder, I also copied this file in HDFS path, I used this Hdfs/path/to/data.csv as input to Kafka-to-hdfs-sync (just a try). No success.


but I am clear that I want Kafka-to-HDFS and not the Hdfs-to-.... 


Please help me with how I should approach this task.

Thanks.

Yogi Devendra

unread,
Dec 29, 2016, 12:48:38 AM12/29/16
to akshay naidu, DataTorrent Users Group
Akshay,

I am not clear about what do you mean by "Simulated Kafka".
Could you please elaborate more on the design for the same? OR if you have some source code for "Simulated Kafka" then please share that.

Also, any specific reason for doing "Simulated Kafka" instead of local deployment of "Apache Kafka"?.

~ Yogi

--
You received this message because you are subscribed to the Google Groups "DataTorrent Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dt-users+unsubscribe@googlegroups.com.
To post to this group, send email to dt-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dt-users/238b3605-e6b8-4b60-af26-bd8d46974c9f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ashwin Putta

unread,
Dec 29, 2016, 12:53:34 AM12/29/16
to akshay naidu, DataTorrent Users Group
Hey Akshay,

The kafka-to-hdfs-sync app reads messages from kafka, not files from hdfs. If you have a file in hdfs, you will need to write that data to the kafka topic so that the kafka-to-hdfs-sync app can read from the corresponding kafka topic. You can use the hdfs-to-kafka-sync app to write data from your csv file to kafka so that the kafka-to-hdfs-sync app can read from the kafka topic.

Regards,
Ashwin.

--
You received this message because you are subscribed to the Google Groups "DataTorrent Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dt-users+u...@googlegroups.com.

To post to this group, send email to dt-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dt-users/238b3605-e6b8-4b60-af26-bd8d46974c9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Regards,
Ashwin.

Yogi Devendra

unread,
Dec 29, 2016, 1:26:58 AM12/29/16
to akshay naidu, DataTorrent Users Group
+dt-users

Email thread continued without adding mailing list in the cc list.
Please read mail trail below for reference.

~ Yogi

On Thu, Dec 29, 2016 at 11:53 AM, Yogi Devendra <deve...@datatorrent.com> wrote:
Yes. Exactly.

~ Yogi

On Thu, Dec 29, 2016 at 11:51 AM, akshay naidu <akshay...@gmail.com> wrote:
Ok, and the output of this 
"kafka-console-producer.sh --broker-list localhost:9092 --topic test0 < sample_data.csv"
can be given to Kafka-to-hdfs-sync?


On Thu, Dec 29, 2016 at 11:48 AM, Yogi Devendra <deve...@datatorrent.com> wrote:
Why do you want to simulate it?

You can use "Apache Kafka" to achieve the same using command similar to following:

kafka-console-producer.sh --broker-list localhost:9092 --topic test0 < sample_data.csv

~ Yogi

On Thu, Dec 29, 2016 at 11:43 AM, akshay naidu <akshay...@gmail.com> wrote:
Aim is to create a kafka source out of csv file so that I can run it with kafka-to-hdfs


On Thu, Dec 29, 2016 at 11:30 AM, Yogi Devendra <deve...@datatorrent.com> wrote:
I am still not clear on what is "Simulated Kafka"?

I repeat:
If you have some source code for "Simulated Kafka" then please share that. OR else you can explain design idea about "Simulated Kafka"

~ Yogi

On Thu, Dec 29, 2016 at 11:26 AM, akshay naidu <akshay...@gmail.com> wrote:
Hello Yogi, 

Sorry for not being able to explain my issue clearly.

This is what i am trying to do..

1. using the sample data file (.csv)   create a simulated Kafka streaming input  

2. Run the demo Kafka-to-hdfs using above streaming input in apphud in Datatorrent.

3. Generate graphic display using DataTorrent tools (example display just for reference only)

4. Create the program named a_kafka_to_hdfs.apa to be upload to apphub

Thanks.


akshay naidu

unread,
Dec 29, 2016, 4:58:58 AM12/29/16
to DataTorrent Users Group, akshay...@gmail.com, Ashwin Putta
hello ashwin, thanks for your response.
I tried to create kafka topic by using hdfs-to-kafka-sync app with csv file as input, application ran successfully but failed after few seconds. please see the attached screenshots.
Thank you.
hdfs-to-kafka properties.png
hdfs-tokafka-sync.png

chai...@datatorrent.com

unread,
Dec 29, 2016, 5:14:35 AM12/29/16
to DataTorrent Users Group, akshay...@gmail.com, ash...@datatorrent.com
Akshay,

  Could you please share the Application Master logs.
  Also, please share the value of producerProperties, it's partially visible from attached screenshots.

Regards,
Chaitanya

akshay naidu

unread,
Dec 29, 2016, 5:37:14 AM12/29/16
to DataTorrent Users Group, akshay...@gmail.com, ash...@datatorrent.com
producerProperties -->> serializer.class=kafka.serializer.DefaultEncoder,producer.type=async,metadata.broker.list=localhost:9092
these were by default, i didn't change them.

and how do i get Application Master logs.
Thanx

chai...@datatorrent.com

unread,
Dec 29, 2016, 5:50:04 AM12/29/16
to DataTorrent Users Group, akshay...@gmail.com, ash...@datatorrent.com
Hi Akshay,
  
   Please refer to the below link about how to get the Application Master logs:

   Please note that the default properties will work only if the Kafka server is running on local machine and the port on which the server accepts connection is 9092.

Regards,
Chaitanya

akshay naidu

unread,
Dec 29, 2016, 9:35:51 AM12/29/16
to DataTorrent Users Group, akshay...@gmail.com, ash...@datatorrent.com, chai...@datatorrent.com
Chaitanya,
please check the attachment for log files. And all the process is going on in my PC,
    How do I check  Kafka server is running and the port on which the server accepts connection.

Thanks for your help.
dt.log

Yogi Devendra

unread,
Dec 30, 2016, 12:31:34 AM12/30/16
to akshay naidu, DataTorrent Users Group
Akshay,

Could you please search on kafka documentation, kafka forums, stack overflow etc. for "How to check status of kafka server?".

After you are successful in that, you may post your findings on this forum. 
 
~ Yogi

To unsubscribe from this group and stop receiving emails from it, send an email to dt-users+unsubscribe@googlegroups.com.

To post to this group, send email to dt-u...@googlegroups.com.

akshay naidu

unread,
Dec 30, 2016, 12:50:23 AM12/30/16
to DataTorrent Users Group, akshay...@gmail.com
Yeah I'll do it. but before that can you please tell me that is it necessary to install kafka on your machine, I mean is it not already in datatorrent.
~ Yogi

Yogi Devendra

unread,
Dec 30, 2016, 1:03:44 AM12/30/16
to akshay naidu, DataTorrent Users Group
Apache Apex or DataTorrent RTS is a Data processing engine. It provides connectors/clients to various data sources (e.g. Kafka, Database). But, does not package any of these third party systems.

It is as absurd as asking 'Google search engine' to host all websites on the internet. 

~ Yogi

To unsubscribe from this group and stop receiving emails from it, send an email to dt-users+unsubscribe@googlegroups.com.

To post to this group, send email to dt-u...@googlegroups.com.

akshay naidu

unread,
Dec 30, 2016, 1:35:31 AM12/30/16
to DataTorrent Users Group, akshay...@gmail.com
understood.. thanks for the explanation. my approach was totally wrong. I'll properly set-up kafka and then try again.
Thank You.

~ Yogi

akshay naidu

unread,
Jan 4, 2017, 8:14:26 AM1/4/17
to DataTorrent Users Group, akshay...@gmail.com, Yogi Devendra, Ashwin Putta, chai...@datatorrent.com
Hi all, 
I have installed apache kafka, its working very well, I am able to create topics, I am also able to create kafka stream out of sample.csv file.. but still when I am trying to launch kafka-to-hdfs-sync app it shows the status as accepted, runs for some time and then the status changes to failed. I have attached dt.log file.
also i've attached the screenshot of custom properties.

Thank you

~ Yogi

dt (1).log
kafka-to-hdfs.png
Reply all
Reply to author
Forward
0 new messages