service account key file location in Add connection

338 views
Skip to first unread message

Venkat CS

unread,
Nov 5, 2019, 5:11:17 AM11/5/19
to CDAP User
Trying to Add connection for Google cloud storage 

Add connection Form Asking for 
    "service account key file location"

How to upload the google service account key file and get the file path for this form?



image (3).png





Albert Shau

unread,
Nov 11, 2019, 1:39:03 PM11/11/19
to cdap...@googlegroups.com
Hi Venkat,

This path is a file path on the local filesystem, which you are responsible to set up. If you are running the CDAP sandbox, it is on your local machine. If you are running CDAP distributed, it will have to be available on every node of your cluster.

There is a jira open to improve wrangler and the plugins so that they don't require a local file (https://issues.cask.co/browse/CDAP-14309).

Regards,
Albert

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/042b26d2-60ee-4afc-b6fb-35e1a7deb884%40googlegroups.com.

Venkat CS

unread,
Nov 12, 2019, 5:15:40 AM11/12/19
to CDAP User
Hi Albert,

Thanks for your response. 

How to make available the "Service Account Key file"  to every node of the Dataproc cluster ?  because dataproc starts only when the pipeline runs ..

Let me explain my cases in detail ..  

CDAP sandbox running in local machine
we are creating a pipeline in CDAP to transfer data 
from:  GCS bucket (GCP Account 1)    ( source connection created using local file system path)
to:      Bigquery ( GCP Account 2 )  ( sink connection created using local file system path)

Case 1 : Using native profile
       The above data transfer process is working perfectly

 Case 2 :  Connecting Dataproc ( GCP Account 2) using system compute profile.    
      In this case Dataproc is creating in GCP: Account 2. 
      while running of pipeline following error  

java.io.FileNotFoundException: /bizstats-gcs/gcs_key.json (No such file or directory)
    at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_222]
    at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_222]
    at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_222]
    at java.io.FileInputStream.<init>(FileInputStream.java:93) ~[na:1.8.0_222]
    at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromJsonKeyFile(CredentialFactory.java:269) ~[bigquery-connector-0.10.11-hadoop2.jar:na]
    at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:86) ~[bigquery-connector-0.10.11-hadoop2.jar:na]
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1886) ~[gcs-connector-1.6.10-hadoop2.jar:na]
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1906) ~[gcs-connector-1.6.10-hadoop2.jar:na]
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1039) ~[gcs-connector-1.6.10-hadoop2.jar:na]
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1002) ~[gcs-connector-1.6.10-hadoop2.jar:na]
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2812) ~[hadoop-common-2.8.5.jar:na]
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386) ~[hadoop-common-2.8.5.jar:na]
    at io.cdap.plugin.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:138) ~[na:na]
    at io.cdap.plugin.format.plugin.AbstractFileSource.prepareRun(AbstractFileSource.java:62) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSource.lambda$prepareRun$0(WrappedBatchSource.java:51) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:50) ~[na:na]
    at io.cdap.cdap.etl.common.plugin.WrappedBatchSource.prepareRun(WrappedBatchSource.java:36) ~[na:na]
    at io.cdap.cdap.etl.common.submit.SubmitterPlugin.lambda$prepareRun$2(SubmitterPlugin.java:71) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext$2.run(AbstractContext.java:551) ~[na:na]
    at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:224) ~[na:na]
    at io.cdap.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:211) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:546) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:534) ~[na:na]
    at io.cdap.cdap.etl.common.submit.SubmitterPlugin.prepareRun(SubmitterPlugin.java:69) ~[na:na]
    at io.cdap.cdap.etl.batch.PipelinePhasePreparer.prepare(PipelinePhasePreparer.java:111) ~[na:na]
    at io.cdap.cdap.etl.batch.mapreduce.MapReducePreparer.prepare(MapReducePreparer.java:97) ~[na:na]
    at io.cdap.cdap.etl.batch.mapreduce.ETLMapReduce.initialize(ETLMapReduce.java:192) ~[na:na]
    at io.cdap.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:109) ~[na:na]
    at io.cdap.cdap.api.mapreduce.AbstractMapReduce.initialize(AbstractMapReduce.java:32) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1.initialize(MapReduceRuntimeService.java:182) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$1.initialize(MapReduceRuntimeService.java:177) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.lambda$initializeProgram$1(AbstractContext.java:640) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:600) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.AbstractContext.initializeProgram(AbstractContext.java:637) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.beforeSubmit(MapReduceRuntimeService.java:547) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.startUp(MapReduceRuntimeService.java:226) ~[na:na]
    at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
    at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]



On Tuesday, November 12, 2019 at 12:09:03 AM UTC+5:30, Albert Shau wrote:
Hi Venkat,

This path is a file path on the local filesystem, which you are responsible to set up. If you are running the CDAP sandbox, it is on your local machine. If you are running CDAP distributed, it will have to be available on every node of your cluster.

There is a jira open to improve wrangler and the plugins so that they don't require a local file (https://issues.cask.co/browse/CDAP-14309).

Regards,
Albert

On Tue, Nov 5, 2019 at 2:11 AM Venkat CS <ven...@absolut-e.com> wrote:
Trying to Add connection for Google cloud storage 

Add connection Form Asking for 
    "service account key file location"

How to upload the google service account key file and get the file path for this form?



image (3).png





--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap...@googlegroups.com.

Albert Shau

unread,
Nov 12, 2019, 2:19:57 PM11/12/19
to cdap...@googlegroups.com
Hi,

If you are using the dataproc provisioner, there is no need to set the service account file location. When you are creating the pipeline, delete the value for that property. This will tell the plugin to use whatever credentials that the dataproc cluster has.

Regards,
Albert

To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/21fd3dcd-8593-41d2-be53-96956b62d1b6%40googlegroups.com.

Venkat CS

unread,
Nov 13, 2019, 1:11:09 AM11/13/19
to CDAP User
Hi Albert,

Thanks for your timely reply. 

In my case we reading data ( source connection) from different Account ( GCP Account 1).  Writing data ( Sink connection) to Different account  ( GCP Account 2)  

Dataproc provisioning in GCP Account 2 ( ie. Sink and Dataproc in same GCP Account).

How can i make available "service account file location" of source connection ?

Thanks
Venkat   



Albert Shau

unread,
Nov 13, 2019, 4:43:35 PM11/13/19
to cdap...@googlegroups.com
Hi Venkat,

In this case you would have to make sure the service account used by dataproc (the default compute service account) has permission to read from the source entity and write to the sink entity. This can be done using IAM roles, as the roles in one project can grant permissions to service accounts from another project.

Right now there isn't a way to configure the service account to use on dataproc so it will always be the default compute account. In the future we will add the ability to configure a different service account.

Regards,
Albert


To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/4340e65f-586a-4cfb-af8b-1815afaffced%40googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages