I'm wondering if anyone has considered integrating the Huggingface Transformer library with TFX.
So far I have written a custom component that operates in eager mode in order to make use of the tokenizer (batch_encode_plus()), then generates a DistilBERT embedding.
But to tokenize a phrase, the library operates on raw text input (and uses numpy ops, not TF ops). This is a barrier to using the tokenizer in TF Transform, and consequently to compose a transform graph that can be used in TF Serving... has anyone explored shimming Tensor ops here? Or is there another way to have the tokenizer work within the graph? I'm curious if there are workarounds or successful integrations to speak of... or if there other issues with using Huggingface Transformers with TFX in production that we should be aware of!
Many thanks,
Joshua Pham
--
You received this message because you are subscribed to the Google Groups "TensorFlow Extended (TFX)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfx+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/286c9522-4b48-43a8-86f3-6b034612cbcd%40tensorflow.org.
In case it's relevant, I read that TF Text depends on TF 2.0, but the best option I have is to use the TF 2.0 API within TF 1.15.2 since we haven't migrated to TF 2.0 yet. This is why I'm using TF Text v1.15.1 (because 2.0.0 release notes say the major version numbers track with TF)
It looks like there was some fix around the `CaseFoldUTF8` op in release v1.15.0: https://github.com/tensorflow/text/releases/tag/v1.15.0. I'm curious if this error I'm seeing is related? Or am I good to use TF Text 2.0+ versions with older TF 1.x?
@Robert: The Colab notebook referenced in the Part 1 blog post is all kindsa broken.Can you take a look? Maybe I'm doing something wrong? Although I'm just clicking "Run All", so I'm not sure how I could mess that up. :)Also, when is Part 2 expected? I'd love to see the deep-dive!
On Mar 31, 2020, at 12:35 PM, Chris Fregly <ch...@fregly.com> wrote:
@Joshua: I'm very interested in seeing what you come up with. Can you share when you're done? Would love to hack on it when it's ready!
Super cool.
On Mar 31, 2020, at 12:27 PM, 'Joshua Pham' via TensorFlow Extended (TFX) <t...@tensorflow.org> wrote:
Thank you Robert! The blog post certainly has a lot that we can look at using. As a progress update I've integrated that BertTokenizer implementation into our Transform component to output tensors for input ids and masks, and am attempting to get it working with a forward pass into Huggingface DistilBERT, using their library, in a Trainer component. Fingers crossed that it plays nicely.
--
You received this message because you are subscribed to the Google Groups "TensorFlow Extended (TFX)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to t...@tensorflow.org.
Thanks Hannes, I've been sticking with TF Text v1.15.1. I verified that TF Text is installed on the Dataflow workers, thinking that it might have just been an issue where it was only installed on the Kubeflow pods. But I'm still getting the same error... using a different model now, so it doesn't use CaseFoldUTF8. But I'm getting "Op type not registered 'RegexSplitWithOffsets'". I think I will create an issue in TF Transform...
--
You received this message because you are subscribed to a topic in the Google Groups "TensorFlow Extended (TFX)" group.
To unsubscribe from this topic, visit https://groups.google.com/a/tensorflow.org/d/topic/tfx/I1yBzNsSqXM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tfx+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/7f58224b-5ef4-4f65-bd4e-c183cffd6ce0%40tensorflow.org.
On Apr 1, 2020, at 8:23 AM, Hannes Hapke <hannes...@gmail.com> wrote:
Hi Joshua,I think the op "RegexSplitWithOffsets" is only available in tf.text 2.1 and higher.- Hannes
On Wed, Apr 1, 2020 at 8:53 AM 'Joshua Pham' via TensorFlow Extended (TFX) <t...@tensorflow.org> wrote:
Thanks Hannes, I've been sticking with TF Text v1.15.1. I verified that TF Text is installed on the Dataflow workers, thinking that it might have just been an issue where it was only installed on the Kubeflow pods. But I'm still getting the same error... using a different model now, so it doesn't use CaseFoldUTF8. But I'm getting "Op type not registered 'RegexSplitWithOffsets'". I think I will create an issue in TF Transform...
--
You received this message because you are subscribed to a topic in the Google Groups "TensorFlow Extended (TFX)" group.
To unsubscribe from this topic, visit https://groups.google.com/a/tensorflow.org/d/topic/tfx/I1yBzNsSqXM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to t...@tensorflow.org.
You received this message because you are subscribed to the Google Groups "TensorFlow Extended (TFX)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfx+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/b5613c5d-966a-4935-8562-d6d47b18c693%40tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/CANeOk11wPU87GVU67p7euguh1_skhW%2BYtZ0MzOFrvGErPHgwmQ%40mail.gmail.com.

To unsubscribe from this group and all its topics, send an email to tfx+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/1A23043F-11C0-4A03-9BF1-8FF8D4F2A6F1%40fregly.com.
On Apr 10, 2020, at 4:12 PM, Robert Crowe <rober...@google.com> wrote:
I'm able to get past StatisticsGen now, but it gets stuck on Evaluator.
On Fri, Apr 10, 2020 at 4:06 PM Robert Crowe <rober...@google.com> wrote:
Sorry again, I was focused on the pip install issues which were blocking previously, not just this one but the taxi ones too. I've updated the pip install in the BERT notebook.
On Fri, Apr 10, 2020 at 3:31 PM Hannes Hapke <hannes...@gmail.com> wrote:
Hi Chris,
A bunch of people are still looking into the issue We suspect an issue with the underlying TFX code.
I will post an update once we have a fix. The notebook with the same pipeline but with the estimator implementation works fine and the models are running in prod.
- Hannes
On Fri, Apr 10, 2020 at 3:20 PM Chris Fregly <ch...@fregly.com> wrote:
Hey Robert!These links are for the taxi cab example. I was hoping for a fix on the original BERT / IMDB notebook. It's still busted.Here is the url again: https://colab.research.google.com/github/tensorflow/workshops/blob/master/blog/TFX_Pipeline_for_Bert_Preprocessing.ipynb#scrollTo=IBYoEPhBeQUiI can't seem to get past this in the notebook... any idea? Excited to use this notebook!!
<PastedGraphic-1.png>
To unsubscribe from this group and all its topics, send an email to tfx+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/F22FF41C-25C6-476E-98CF-C4C7C12821DC%40fregly.com.
bert_layer = load_bert_layer() encoder_inputs = dict( input_word_ids=tf.reshape(input_word_ids, (-1, max_seq_length)), input_mask=tf.reshape(input_mask, (-1, max_seq_length)), input_type_ids=tf.reshape(input_type_ids, (-1, max_seq_length)), ) outputs = bert_layer(encoder_inputs)
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/634b744e-a175-4f8b-befc-643ad311124dn%40tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfx/CAP7-i3rOfuGZ%3D25krLV3%3DZhm5jtM47KLsAnF489ttYuFTbTqDw%40mail.gmail.com.