Question on large text column with high feature importance in AutoML

12 views
Skip to first unread message

Jack Carlson

unread,
Nov 6, 2023, 8:59:36 PM11/6/23
to cloud-automl-tables-discuss
Hi,
I am training a model where one column contains on average 3000 characters, containing highly descriptive language of a vehicle for sale.  My label column is the price, and the column in question has by far the highest feature importance of all columns included in the dataset.
My question:  How does AutoML hander embedding for long form natural language for tabular regression? Should I be running this text through an embedding model and using the vectorized column in my training data? Or, is this something that should be left to AutoML?
Thank you.
Sincerely,
Jack

Yang Yang

unread,
Nov 7, 2023, 6:00:34 PM11/7/23
to ja...@meetreserve.com, cloud-automl-tables-discuss
Hi Jack,

Thanks for reaching out. AutoML Tabular handles the text feature in the way described here, basically n-gram is used which is sufficient in some cases, and the array(the way to store embedding) will be handled in the way described here. So whether to use an extra embedding for the text column highly depends on whether the extra information your embedding provides will help improve the modeling. You can try with both approaches.

Best,
Yang

--
You received this message because you are subscribed to the Google Groups "cloud-automl-tables-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-automl-tables...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-automl-tables-discuss/b78c3152-6394-454c-8cfe-617a9d3b78f3n%40googlegroups.com.


--
Best,
Yang
------
I often catch up on email after working hours. This is for my convenience and there is no expectation of a response until working hours begin again, unless marked as 'urgent'. 
Reply all
Reply to author
Forward
0 new messages