How to create/access thrift server on Cloud DataProc

455 views
Skip to first unread message

adityak...@exadatum.com

unread,
Mar 27, 2017, 1:56:21 AM3/27/17
to Google Cloud Dataproc Discussions
I am working on a Hadoop to Google Cloud migration project, wherein I need to know the thrift server details on the destination (or Google DataProc instance) and other host details. How do I go about it ?
1) Is thrift already running and is there a way to figure out host URL and password.
2) Do I need to manually install thrift there on the data proc and then fetch all the details - host and port. 
3) How do I can do distcp between my remote hadoop and data proc hadoop to transfer files.

Patrick Clay

unread,
Mar 31, 2017, 3:01:41 PM3/31/17
to Google Cloud Dataproc Discussions
1-2) As far as I know Hadoop does not have Thrift based file service. WebHDFS is enabled on the HDFS NameNode and we do not run an HTTPFS server if that is what you mean.
3)We recommend you use the Google Cloud Storage FileSystem for Hadoop to copy your data from your cluster to Google Cloud Storage and then access it there from Dataproc. distcp works just fine for uploading.

If you wish to directly access HDFS on Dataproc you would probably need a VPN to join your remote Hadoop Network to the Google Compute Engine Network that the Dataproc cluster runs on. 

Reply all
Reply to author
Forward
0 new messages