Cloud Pubsub to pubsub dataflow | not going to outputTopic

123 views
Skip to first unread message

Nitin Kalra

unread,
May 19, 2022, 10:58:06 AM5/19/22
to pubsub-discuss

Hi All,

I am running a dataflow template - Cloud pubsub to pubsub. The dataflow job is running, I can see the data freshness is increasing as it picks up the messages from the inputsubscription but it is not putting the messages to the outputTopic.

What could be the issue here? I don't see any errors in the logs as well.

 Please see the attached snapshots.

Capture2.JPG
Capture.JPG

Nitin Kalra

unread,
May 19, 2022, 3:06:39 PM5/19/22
to pubsub-discuss
Any one ??

Jose Gutierrez Paliza

unread,
May 19, 2022, 4:50:26 PM5/19/22
to pubsub-discuss
The common issues on why this happens are:
   
 - missing output topic or wrong topic.    
 - insufficient permissions.
 - insufficient Pub/Sub quota.

Nitin Kalra

unread,
May 19, 2022, 8:05:43 PM5/19/22
to pubsub-discuss
- the topic is correct because I can click on the topic in the dataflow configuration which comes on the right side panel in the dataflow
- I have granted service-account that is associated with dataflow permissions of publisher/viewer
- what pub/sub quota? The topic to which I am forwarding is an existing old topic.

Jose Gutierrez Paliza

unread,
May 20, 2022, 3:30:48 PM5/20/22
to pubsub-discuss

What I mean by the Pub/Sub quota is that using Dataflow there are sometimes that you will need an additional quota[1] for Pub/Sub.

You can see this document[2] that provides information about Pub/Sub quotas and limits.


[1]https://cloud.google.com/dataflow/quotas#additional-quotas 

[2]https://cloud.google.com/pubsub/quotas

Nitin Kalra

unread,
May 20, 2022, 3:47:53 PM5/20/22
to pubsub-discuss
I dont think it is quota issue as well, because we are in non-prod environment. I have turned off other dataflows and only 1 msg at a time is tested.

Here is the terraform configuration that we are using:


 #this subscription will publish the message to reply topic
resource "google_pubsub_subscription" "reply-sub" {
  project = var.project_id
  name    = var.reply_sub["${var.env}"]["name"]
  topic   = var.reply_name

  message_retention_duration = var.reply_sub["${var.env}"]["retention"]
  retain_acked_messages      = var.reply_sub["${var.env}"]["retain_acked_message"]
  ack_deadline_seconds       = var.reply_sub["${var.env}"]["ack_deadline_seconds"]

  retry_policy {
    minimum_backoff = "10s"
    maximum_backoff = "30s"
  }
}

# Dataflow in this case publishing reply.subcription to topic -reply
resource "google_dataflow_job" "reply-sub-to-reply-job" {
  project           = var.project_id
  name              = "reply_sub_to_reply_job"
  template_gcs_path = "gs://dataflow-templates/2022-01-10-00_RC00/Cloud_PubSub_to_Cloud_PubSub"
  temp_gcs_location = "gs://${var.project_id}-scratchpad/adapterTmp"
  network           = var.project_vpc["${var.env}"]
  subnetwork        = "regions/${var.region}/subnetworks/${var.notification-subnet["name"]}-${var.env}"
  machine_type      = var.df_machine_type
  region            = var.df_controller_region
  zone              = "${var.region}-a"
  ip_configuration  = "WORKER_IP_PRIVATE"
  max_workers       = var.file-maxworkers
  parameters = {
    inputSubscription = "projects/${var.project_id}/subscriptions/${google_pubsub_subscription.reply-sub.name}"
    outputTopic   = "projects/${var.project_id}/topics/reply"
  }
}

Jose Gutierrez Paliza

unread,
May 23, 2022, 5:55:21 PM5/23/22
to pubsub-discuss

I followed this documentation[1] using the console and there is no issue following the guide, this seems like an error of your terraform code, or a possible error on the library.

I suggest you post this as a github issue[2] or post it on Stack Overflow with the code you use. Also you could try to debug your terraform code with detailed logs using `TF_LOG`[3] which you need to enable.


[1]https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#pubsub-to-pubsub 

[2]https://github.com/hashicorp/terraform-provider-google/issues 

[3]https://www.terraform.io/internals/debugging

Nitin Kalra

unread,
May 25, 2022, 4:08:03 PM5/25/22
to pubsub-discuss
I believe it's an issue on the Dataflow part. I haven't seen any working examples on google/stackoverflow.

Here is the snapshot of my dataflow job. I don't see any issue in the setup/pipeline options, there are no errors or even warnings in the logs. But still the job is not sending the events to the output topic.

Capture3.JPG

Nitin Kalra

unread,
May 30, 2022, 2:35:22 PM5/30/22
to pubsub-discuss
Found one issue in the logs:

message: "Publishing to topic projects/<projectName>-12452f/topics/message.reply failed, will retry: PERMISSION_DENIED: Http(403) Forbidden === Source Location Trace: === apiserving/clients/cpp/util/status.cc:194"


But I am not able to identify the service account that is failing here because the service account that I used to create dataflow job has all below permissions


BigQuery Data Editor
Compute Network Viewer
Dataflow Worker
Pub/Sub Subscriber
Pub/Sub Viewer
Storage Object Admin

What else is missing for Cloud_Pub_to_Cloud_Pubsub dataflow job?

Nitin Kalra

unread,
May 30, 2022, 4:41:59 PM5/30/22
to pubsub-discuss
So, I found the solution. I added roles/pubsub.publisher and it has started working. But the job stops putting the messages to pubsub topic intermittently.
Reply all
Reply to author
Forward
0 new messages