I noticed that there's a file in the data_generators folder named "pointer_generator_word.py" which has this description:
"""Data generator for pointer-generator for word transformer."""
And pointer-generators are usually summarization. If it is for summarization, what format does the data need to be in?
'generate_samples' is not defined. But since it inherits from the `Text2textTmpdirTokens' ,
whose generate_samples inherits from Text2textTmpdir, whose 'generate_samples' function returns the 'text2text_txt_iterator' function. Functions returns 'yield {"inputs": inputs, "targets": targets, "extra_label": [extra_label]}'
I am confused by the 'extra_label' part.
The inputs is given by txt_line_iterator, which seems to be the raw text. But I am confused with the targets, which are given by txt_and_label_iterator, and yield both 'targets' and 'extra_label'