Tokenizer in Tensorflow.js

1,635 views
Skip to first unread message

Mayur Bhole

unread,
Aug 6, 2018, 3:05:55 AM8/6/18
to TensorFlow.js Discussion
Hi, I am using tensorflow.js for text classification. We can't use word embedding for some reason, and would like to use bag-of-words approach. Wondering if we have a tokenizer in tensorflow.js for this purpose? Is it possible to use keras tokenizer?

Edoh Kodjo

unread,
Aug 6, 2018, 4:10:10 AM8/6/18
to Mayur Bhole, TensorFlow.js Discussion
Hi,

You have this answer on stackoverflow that can help you to start: https://stackoverflow.com/questions/51663068/tensorflow-js-tokenizer

El lun., 6 ago. 2018 9:05, Mayur Bhole <mayurb...@gmail.com> escribió:
Hi, I am using tensorflow.js for text classification. We can't use word embedding for some reason, and would like to use bag-of-words approach. Wondering if we have a tokenizer in tensorflow.js for this purpose? Is it possible to use keras tokenizer?

--
You received this message because you are subscribed to the Google Groups "TensorFlow.js Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfjs+uns...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/tfjs/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfjs/0b226055-0899-4f48-8540-0027ac65bcdf%40tensorflow.org.

Mayur Bhole

unread,
Aug 6, 2018, 5:27:14 AM8/6/18
to TensorFlow.js Discussion, mayurb...@gmail.com
Thanks Edoh Kodjo for your quick reply. I am also using 'natural' package and have written my own text to vector generator. I wanted to check if keras tokenizer is available in js.

Edoh Kodjo

unread,
Aug 6, 2018, 5:31:10 AM8/6/18
to Mayur Bhole, TensorFlow.js Discussion
For now, I think that it is not available yet

Stan Bileschi

unread,
Aug 6, 2018, 7:38:03 AM8/6/18
to TensorFlow.js Discussion, mayurb...@gmail.com
Hi Mayur,

As I'm sure you understand, the python version of the tokenizer can conceptually be broken down into three phases: first to break a string into a sequence of substrings, second to determine the common tokens, and third to convert the substring sequence into an integer sequence using the known list of tokens.

I'm working on developing tooling to support these types of preprocessing actions in JS, but I'm curious, which, if any, of these three steps do you need for your application?

Thanks,
Stan

Edoh Kodjo

unread,
Aug 6, 2018, 10:16:13 AM8/6/18
to Stan Bileschi, TensorFlow.js Discussion, mayurb...@gmail.com
Hi Stan,

Can you please share your GitHub project as I would like to contribute and add the natural language processing to tensorflowJs?

Thanks


Stan Bileschi

unread,
Aug 6, 2018, 10:22:11 AM8/6/18
to Edoh Kodjo, TensorFlow.js Discussion, mayurb...@gmail.com
Hi Edoh, 

Thanks for your interest!  You can get an idea what we're looking at this branch on my private fork.  Right now I'm playing around with designs that would work in both a python and JS context.  I don't think the plan & direction are clear enough at this time to accept external contributions, but I can notify you when we have buy-in from the owners of the python keras that the plan is good to go.

Cheers,
Stan

 

To unsubscribe from this group and stop receiving emails from it, send an email to tfjs+unsubscribe@tensorflow.org.



--
Stan Bileschi Ph.D.  |  SWE  | bile...@google.com | 617-230-8081

Edoh Kodjo

unread,
Aug 6, 2018, 10:24:19 AM8/6/18
to Stan Bileschi, TensorFlow.js Discussion, mayurb...@gmail.com
Okay, It sounds good.

Mayur Bhole

unread,
Aug 10, 2018, 1:21:03 PM8/10/18
to TensorFlow.js Discussion, mayurb...@gmail.com
Hi Stan,

I was looking for 2 functions from keras-prerprocessing text.py: texts_to_matrix() and fit_on_texts(). For now, I have implemented these functions for my needs. However, Please let me know when you start accepting external contributions. I will be happy to contribute.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages