Is the Text2textCopyableTokens problem for summarization? If so, what to put for 'extra_label'?

13 views
Skip to first unread message

Santosh Gupta

unread,
Jul 16, 2019, 8:19:00 PM7/16/19
to tensor2tensor
I noticed that there's a file in the data_generators folder named "pointer_generator_word.py" which has this description:

"""Data generator for pointer-generator for word transformer."""

And pointer-generators are usually summarization. If it is for summarization, what format does the data need to be in?

'generate_samples' is not defined. But since it inherits from the `Text2textTmpdirTokens' ,


whose generate_samples inherits from Text2textTmpdir, whose 'generate_samples' function returns the 'text2text_txt_iterator' function. Functions returns 'yield {"inputs": inputs, "targets": targets, "extra_label": [extra_label]}'

I am confused by the 'extra_label' part.


The inputs is given by txt_line_iterator, which seems to be the raw text. But I am confused with the targets, which are given by txt_and_label_iterator, and yield both 'targets' and 'extra_label'

Lukasz Kaiser

unread,
Jul 18, 2019, 12:26:17 PM7/18/19
to Santosh Gupta, tensor2tensor
I must admit I'm not sure what problem this file is about, does anyone know?

Lukasz
> --
> You received this message because you are subscribed to the Google Groups "tensor2tensor" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tensor2tenso...@googlegroups.com.
> To post to this group, send email to tensor...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/tensor2tensor/06daf486-5d3c-4841-8ddd-30ea9b5cf220%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages