Target encoding in TensorFlow Transform

IamExperimenting Now

unread,

Apr 24, 2021, 8:28:10 AM4/24/21

to t...@tensorflow.org

Hi Team,

We have done feature engineering using categorical encoder python module specifically target encoding method for few features, it’s one of the methods we have applied. Also, we have tried different tf.feature_column methods, but unfortunately we weren’t able to build an efficient model using the tf.feature_columns engineering method.

Just wondering, is there any opportunity to implement target encoding in TensorFlow transform. The reason being is we have decided to use TFX as a pipeline.

Any suggestions or relevant articles would be appreciated.

Thanks in advance,

Saravanan

Ihor Indyk

unread,

Apr 28, 2021, 9:14:12 PM4/28/21

to TensorFlow Extended (TFX), iamexperi...@gmail.com

Hi Saravanan,

We don't have out-of-the-box solution for target encoding; however, we do have some main ingredients.

Full pass over the training data is needed to get the posterior probability of the target given particular categorical value and the prior probability of the target. In Transform this can only be done with analyzers, e.g. using tft.vocabulary with `store_frequency=True` over a concatenated target+categorical column.

In order to map each value to the target encoding, resulting vocabulary file can be used in a mapper that will be remotely similar to tft.apply_vocabulary but with several custom lookups: the lookups will have to aggregate probabilities of the categorical value and target.

Note that this approach is meant for a categorical column that doesn't have predetermined weights, a bit more manipulations may be needed if it does.

Ihor

IamExperimenting Now

unread,

May 19, 2021, 4:49:07 AM5/19/21

to Ihor Indyk, TensorFlow Extended (TFX)

Thanks for your reply. Would it be possible for you to provide some reference article, blog?

Thanks

Ihor Indyk

unread,

May 27, 2021, 10:48:27 AM5/27/21

to IamExperimenting Now, TensorFlow Extended (TFX)

I don't think we have something particularly relevant for target encoding, but these have some code snippets generating vocabularies:

[1] https://www.tensorflow.org/tfx/transform/get_started

[2] https://colab.sandbox.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/transform/census.ipynb

Reply all

Reply to author

Forward