DAG length limit & inconsistencies between different composer environments

169 views
Skip to first unread message

kanhu....@gmail.com

unread,
Mar 3, 2021, 10:53:42 AM3/3/21
to cloud-composer-discuss
Hi, 
We have a DAG which is approx. 90 KB 
which works well on 
Composer 1.12.5 Airflow 1.10.10  & 
Composer 1.12.3 Airflow 1.10.10.

However, when I tried to work the same DAG on 
Composer 1.12.4 Airflow 1.10.10 & 
Composer 1.13.3 Airflow 1.10.12

DAG fails to load with error ""Data too long for column 'source_code' at row 1"
while executing the SQL.

[SQL: INSERT INTO dag_code (fileloc_hash, fileloc, last_updated, source_code) VALUES (%s, %s, %s, %s)]

When debugging, I found the metadata DB schema difference for source_code field in the dag_code table.

It works where data type for source_code field is mediumtext. (Max length : 16 MB) 

Create Table dag_code,"CREATE TABLE `dag_code` ( `fileloc_hash` bigint(20) NOT NULL, `fileloc` varchar(2000) NOT NULL, `last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `source_code` mediumtext CHARACTER SET ucs2 NOT NULL, PRIMARY KEY (`fileloc_hash`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8"  


It doesn't work where data type for source_code field is text. (Max length : 64KB) 

Create Table dag_code,"CREATE TABLE `dag_code` ( `fileloc_hash` bigint(20) NOT NULL, `fileloc` varchar(2000) NOT NULL, `source_code` text NOT NULL, `last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`fileloc_hash`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8"


So, the question is why this inconsistency in schema?

The later version (Composer 1.13.3 Airflow 1.10.12) also has text data type. Why is the future versions are more restrictive. 

Also, limit of 64KB sounds too bad in 2021. 


Any help and guidance will help.  



Sudharsan

unread,
Mar 3, 2021, 11:41:40 AM3/3/21
to kanhu....@gmail.com, cloud-composer-discuss
Hi,

I was facing similar issue during last August during a production migration, where as the code worked with out any issues. Google team was not able figure out the issue initially. Later we upgraded the composer to next version which worked with out any issues.

One option we also tried out was splitting the day to multiple day as well as carving out some of the logic out of the dag also helped us.


--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/3bbd3b32-6523-4b12-8c84-3f7fa99b6f9bn%40googlegroups.com.

kanhu....@gmail.com

unread,
Mar 3, 2021, 12:23:40 PM3/3/21
to cloud-composer-discuss
Thanks a lot. Because a later version 1.13.3 didn't work, I am not sure if keeping DAGs longer than 64 KB will ever be future-proof unless Google decides to have metadata DB as Postgres.

We can trim our DAGs & that seems to be the only way to go about it.

If you don't mind, could you please let me know what version of composer and mySQL versions you had success. 
Also, if you can find out the DDL for dag_code table, that will be extremely helpful.

To get the DDL, 
You can go to Airflow UI --> Data Profiling --> Adhoc query & run the query "show create table dag_code". Export it to CSV using the button on top if you want a formatted output.  

Sudharsan

unread,
Mar 3, 2021, 10:40:29 PM3/3/21
to kanhu....@gmail.com, cloud-composer-discuss
Unfortunately I don't have access to the environment any more.. let me check and respond incase I have anything in my mailbox.

Goran Obradović

unread,
Mar 4, 2021, 7:16:33 AM3/4/21
to Sudharsan, kanhu....@gmail.com, cloud-composer-discuss
Hi,

Cloud Composer has increased the length of source_code column to 16M (mediumtext) in version 1.12.5[1], and this hasn't been changed ever since, so in all versions 1.12.5+ this problem shouldn't occur.

For example, in version 1.14.4:

mysql> use composer-1-14-4-airflow-1-10-14-b664fe4e
mysql> describe dag_code;
+--------------+---------------+------+-----+-------------------+-----------------------------+
| Field        | Type          | Null | Key | Default           | Extra             |
+--------------+---------------+------+-----+-------------------+-----------------------------+
| fileloc_hash | bigint(20)    | NO   | PRI | NULL              |                   |
| fileloc      | varchar(2000) | NO   |     | NULL              |                   |
| last_updated | timestamp     | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| source_code  | mediumtext    | NO   |     | NULL              |                   |
+--------------+---------------+------+-----+-------------------+-----------------------------+


What is the name of your database, and did you create your environment from scratch or you upgraded it from some previous version? Perhaps there is some bug in migration scripts so the schema wasn't updated properly, or your environment somehow continued using a database with an old schema.


Regards,
Goran

Reply all
Reply to author
Forward
0 new messages