Kaggle Font

0 views

Skip to first unread message

Sherman Desrosiers

unread,

Aug 5, 2024, 12:37:53 AM8/5/24

to leonegooca

Inmy recent CVPR'19 paper [], I constructed a synthetic dataset, similar to MNIST or Fashion-MNIST, with known manifold: FONTS, see Figure 1. The known manifold allows to study the occurance of adversarial examples in relation to the manifold. For example, the dataset allows to show that adversarial examples indeed leave the manifold, as previously suggested in [], and adversarial examples can be constrained to the manifold, as similar to [].

For creating a database of prototype images, I followed Deep Fonts: TrueType fonts from Google Fonts are loaded using PIL.ImageFont and the characters "A" to "J" are rendered in a fixed font size. The images are padded and subsequently scaped to $[0,1]$, see Listing 1. The images are stacked such that the database contains, for each font, all $10$ characters:

The decoder in Listing 2 can be used in various ways. In [], I used the decoder to search the latent space for mis-classified examples, resulting in on-manifold adversarial examples in the spirit of []. The decoder, however, can also be used to project any image onto the manifold; this is useful for determining the distance of arbitrary images to the data manifold. I will discuss both use cases in upcoming posts.

Definitely watching this thread because this is exactly the use case I would like to apply as well. I was thinking that I might have to do it in two steps, train an NLP model on the text only to generate predictions and then combine these with the tabular features. I have not yet seen an integrated example with fastai but I would be surprised if no-one in the community is working on an integrated multi-modal approach.

Alternatively, there are already approaches using torch on kaggle (5th Place Solution Code Kaggle, 5th place for the petfinder project, here is an overview PetFinder.my Adoption Prediction Kaggle) as well as at least one fastai approach (GitHub - EtienneT/fastai-petfinder: Merging image, tabular and text data in a neural network with fastai - most likely fastai v1).

Otherwise, I think widedeep from pytorch (GitHub - jrzaurin/pytorch-widedeep: A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch) should be a good start.

We consider the use of automated supervised learning systems for data tablesthat not only contain numeric/categorical columns, but one or more text fieldsas well. Here we assemble 18 multimodal data tables that each contain some textfields and...

compares a bunch of strategies for modeling such data and finds the one used in autogluon can give really high accuracy (it even gets 1st or 2nd place on historical leaderboard of numerous ML competitions from Kaggle & MachineHack)

In a previous post, we discussed an enhanced method of passing additional data to the ULMFiT algorithm for unstructured text classification. This blog post explains the findings of this more-detailed evaluation, where we have compared how different...

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

This dataset is not just a random collection of texts. It encompasses a wide array of fonts with varying sizes and weights, including Google Fonts such as Truculenta, Roboto, Playfair Display, Madimi One, and Kode Mono. The diversity in font types ensures that the machine learning models trained with this dataset can adapt to various text presentations, significantly improving the OCR process for digitizing old books.

At the heart of this project lies a convolutional autoencoder, a sophisticated deep-learning model renowned for its denoising capability. Traditional methods, such as thresholding, often fall short when faced with the delicate task of preserving character separation in lighter fonts. These methods tend to merge characters or fail to detect them entirely, leading to significant information loss and errors in the digitized text. However, the trained model I've developed showcases an exceptional ability to extract and preserve the text even for the lightly printed ones. This capability is vividly demonstrated in the side-by-side comparisons of the outputs from the traditional threshold method and the convolutional autoencoder model.