Is the Text2textCopyableTokens problem for summarization? If so, what to put for 'extra_label'?

Skip to first unread message

Santosh Gupta

Jul 16, 2019, 8:19:00 PM7/16/19
to tensor2tensor
I noticed that there's a file in the data_generators folder named "" which has this description:

"""Data generator for pointer-generator for word transformer."""

And pointer-generators are usually summarization. If it is for summarization, what format does the data need to be in?

'generate_samples' is not defined. But since it inherits from the `Text2textTmpdirTokens' ,

whose generate_samples inherits from Text2textTmpdir, whose 'generate_samples' function returns the 'text2text_txt_iterator' function. Functions returns 'yield {"inputs": inputs, "targets": targets, "extra_label": [extra_label]}'

I am confused by the 'extra_label' part.

The inputs is given by txt_line_iterator, which seems to be the raw text. But I am confused with the targets, which are given by txt_and_label_iterator, and yield both 'targets' and 'extra_label'

Lukasz Kaiser

Jul 18, 2019, 12:26:17 PM7/18/19
to Santosh Gupta, tensor2tensor
I must admit I'm not sure what problem this file is about, does anyone know?

> --
> You received this message because you are subscribed to the Google Groups "tensor2tensor" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To post to this group, send email to
> To view this discussion on the web visit
> For more options, visit
Reply all
Reply to author
0 new messages