CDAP: Issue with HTTP Streaming Source

41 views
Skip to first unread message

Kethareeswaran Krishnan

unread,
Sep 29, 2022, 5:55:31 AM9/29/22
to cdap...@googlegroups.com, Darragh Buffini, Vianney Boncorps, Vikas Chittoor

Hi CDAP Team,

 

I have been using CDAP for my Realtime and Batch pipeline development and I am facing an issue when I use “HTTP Streaming Source” plugin for Realtime pipeline.

When I use “HTTP Batch Source” plugin for Batch pipeline, then I don’t get any issues and I am able to successfully deploy the pipeline and execute it.

 

Attached is the screenshot of the issue that I am seeing in my Realtime pipeline when I click on Validate button (the parameters that were passed to this plugin were tested with HTTP Batch source plugin and they worked in Batch pipeline without any issues).

 

If I ignore this issue and proceed with deploying this pipeline, then the pipeline fails during execution after 8 minutes, I tried it twice and both the time the pipeline failed after 8 minutes. There are no traces of any error message in the raw log (attached) because of which the pipeline had failed.

 

Can you kindly check on this issue and let me know the solution to overcome this issue.

 

 

Thanks,

Kethar

CYGNVS

HTTP Streaming Source Issue.png
default-HTTP_Contacts_v3_Realtime-spark-DataStreamsSparkStreaming-7db146be-3fd4-11ed-8827-566f1f07f5b3.log

Albert Shau

unread,
Sep 30, 2022, 6:15:57 PM9/30/22
to cdap...@googlegroups.com, Darragh Buffini, Vianney Boncorps, Vikas Chittoor
Hi Kethar,

I've opened https://cdap.atlassian.net/browse/PLUGIN-1417 for the validation error, we've seen this type of problem before but it requires a fix in the plugin.

I'm not sure about the pipeline failure, but I do see the following warnings in your log file:

The firewall rules for specified network or subnetwork would likely not permit sufficient VM-to-VM communication for Dataproc to function properly. See https://cloud.google.com/dataproc/docs/concepts/network for information on required network setup for Dataproc.

2022-09-29 09:00:40,118 - WARN  [Timer-0:o.a.s.s.c.YarnClusterScheduler@69] - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

It does seem like it is processing microbatches later though, so not entirely sure if the cluster is misconfigured or not. In any case, I would also try checking the Dataproc job logs to see if there is any other information.

Regards,
Albert

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/CH0PR08MB740382EA80EECF34C47B80909E549%40CH0PR08MB7403.namprd08.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages