I am attempting to reduce the dimensionality of a categorical feature by extracting an embedding layer from a neural net and using it as an input feature in a seperate XGBoost model.
An embedding layer has the dimensions (nr. unique categories, chosen output size). How can it be concatenated to the continuous variables in the original training data with the dimensions (nr. observations, nr. features)? Since each row in the embedding array represents a category, I assume that we can inner join the embedding array to our training observations on category, but how can we be certain which category is represented by a given row in the embedding array?
If the embedding output size is 10, will this constitute 10 different features when used as input data in a separate model?
Does anyone perhaps know of a code example where embeddings are used as inputs to a seperate model of a different type?
Thanks in advance!