I am having trouble creating an Issue tracker from the CDAP page so thought I'd post here:
I have a Data Fusion pipeline that moves data from Cloud SQL Server to BigQuery that is not correctly putting data in the target BigQuery table; this happens when the table in Cloud SQL has been truncated so there are 0 records in the source, but when running the pipeline, the BQ table does not get truncated.
Looking at the pipeline log, I see why no BQ job ran. In the logs it says:
Importing into table 'XYZ' from 0 paths; path[0] is '(empty)'; awaitCompletion: true
Work-around
Adding a Conditional between the source and the sink:
The conditional statement is just:
token['CloudSQL MySQL']['output'] > 0
where "CloudSQL MySQL" is the Label name of the inputting plugin. I tested the attached pipeline by creating a MySQL table with no records, and a BQ table with lots of records. When I ran the pipeline, the BQ table was truncated. I then added a record to the MySQL table, and ran the pipeline to test if the True statement worked, and it did; the BQ table had one record!
Constraints
Would like to do this without using Replication
Feature Request
Avoid use of conditional and have the BQ sink truncate the table if it detects that zero records have been outputted from the upstream source.
Best,
Omar