[JIRA] (PLUGIN-587) GCS Sink tries to connect to Metadata service when using CDAP sandbox

250 views
Skip to first unread message

Fernando Velasquez (Jira)

unread,
Feb 8, 2021, 3:46:50 PM2/8/21
to cdap-...@googlegroups.com
Fernando Velasquez created an issue
 
CDAP Plugins / Bug PLUGIN-587
GCS Sink tries to connect to Metadata service when using CDAP sandbox
Issue Type: Bug Bug
Assignee: Unassigned
Created: 08/Feb/21 12:46 PM
Priority: Major Major
Reporter: Fernando Velasquez

When using the GCS Sink in CDAP sandbox, the pipeline fails with the following error when the Service Account Type is set as File Path (using a valid Service Account JSON file):

```
2021-02-05 15:26:24,096 - WARN [Executor task launch worker for task 0:c.g.a.o.DefaultCredentialsProvider@233] - Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
2021-02-05 15:28:12,766 - ERROR [Executor task launch worker for task 0:o.a.s.e.Executor@91] - Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:250) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:389) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1307) ~[gcs-connector-hadoop2-2.0.0.jar:na]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1442) ~[gcs-connector-hadoop2-2.0.0.jar:na]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1426) ~[gcs-connector-hadoop2-2.0.0.jar:na]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:466) ~[gcs-connector-hadoop2-2.0.0.jar:na]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:562) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:549) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
at io.cdap.plugin.gcp.gcs.sink.GCSOutputCommitter.commitTask(GCSOutputCommitter.java:95) ~[1612556735286-0/:na]
at io.cdap.plugin.gcp.gcs.sink.DelegatingGCSOutputCommitter.commitTask(DelegatingGCSOutputCommitter.java:91) ~[1612556735286-0/:na]
at io.cdap.plugin.gcp.gcs.sink.DelegatingGCSRecordWriter.close(DelegatingGCSRecordWriter.java:87) ~[1612556735286-0/:na]
at io.cdap.cdap.etl.spark.io.TrackingRecordWriter.close(TrackingRecordWriter.java:46) ~[hydrator-spark-core2_2.11-6.4.0-SNAPSHOT.jar:na]
at io.cdap.cdap.etl.common.output.MultiRecordWriter.close(MultiRecordWriter.java:64) ~[cdap-etl-core-6.4.0-SNAPSHOT.jar:na]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$5.apply$mcV$sp(PairRDDFunctions.scala:1131) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1374) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.scheduler.Task.run(Task.scala:100) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) ~[io.cdap.cdap.spark-assembly-2.1.3.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_221]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_221]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_221]
Caused by: java.net.ConnectException: Host is down (connect failed)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_221]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[na:1.8.0_221]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[na:1.8.0_221]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_221]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_221]
at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_221]
at sun.net.NetworkClient.doConnect(NetworkClient.java:175) ~[na:1.8.0_221]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) ~[na:1.8.0_221]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) ~[na:1.8.0_221]
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) ~[na:1.8.0_221]
at sun.net.www.http.HttpClient.New(HttpClient.java:339) ~[na:1.8.0_221]
at sun.net.www.http.HttpClient.New(HttpClient.java:357) ~[na:1.8.0_221]
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226) ~[na:1.8.0_221]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) ~[na:1.8.0_221]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) ~[na:1.8.0_221]
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) ~[na:1.8.0_221]
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:148) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:192) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:494) ~[gcs-connector-hadoop2-latest.jar:na]
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:247) ~[gcs-connector-hadoop2-latest.jar:na]
... 28 common frames omitted
```

It seems the Hadoop credentials are not set correctly for the GCS Filesystem.

Add Comment Add Comment
 
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100154-sha1:c8b36cd)
Atlassian logo

Fernando Velasquez (Jira)

unread,
Feb 8, 2021, 3:50:32 PM2/8/21
to cdap-...@googlegroups.com
Fernando Velasquez updated an issue
Change By: Fernando Velasquez
When using the GCS Sink in CDAP sandbox, the pipeline fails with the following error when the *Service Account Type* is set as _File Path_ (using a valid Service Account JSON file):

```

{noformat}
2021-02-05 15:26:24,096 - WARN  [Executor task launch worker for task 0:c.g.a.o.DefaultCredentialsProvider@233] - Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
... 28 common frames omitted {noformat}
```

It seems the Hadoop credentials are not set correctly for the GCS Filesystem.

Bhooshan Mogal (Jira)

unread,
Feb 8, 2021, 3:52:49 PM2/8/21
to cdap-...@googlegroups.com

Bhooshan Mogal (Jira)

unread,
Feb 8, 2021, 3:53:01 PM2/8/21
to cdap-...@googlegroups.com

Fernando Velasquez (Jira)

unread,
Feb 10, 2021, 5:00:16 PM2/10/21
to cdap-...@googlegroups.com
Fernando Velasquez updated an issue


I have tested both setting the Service Account as JSON and as a file reference with the same results. This service account has Owner privileges on the project where this pipeline was tested.
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100154-sha1:7cdb571)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages