use of templates from Google paper and original SGD dataset?

21 views
Skip to first unread message

Michael White

unread,
Apr 27, 2021, 9:49:28 PM4/27/21
to gem-benchmark
Hi again folks

Since the GEM page for the schema-guided dialogue dataset refers to the Kale & Rastogi paper on template-guided text generation, I was assuming that using the templates from that paper was fair game.  However, in trying to make use of those templates, I found that the GEM SGD dataset was again missing information necessary to use those templates, which is however recoverable from the original SGD dataset.  In particular, in addition to the service for a system turn, it is sometimes necessary to supply the service call method to accurately determine what a system input actually means.  For example, for the underspecified act NOTIFY_FAILURE (which has no arguments), not only does this mean something different if the service is banks vs. music, but it also means something different whether the current service call method (essentially the user intent) is to transfer money vs check balance.  This can be seen on this page which lists the templates for the banks service: https://github.com/google-research/schema-guided-dialogue/blob/main/generation/utterance_templates/Banks_2.tsv

This additional piece of information can be looked up in the original SGD dataset by using the dialog and turn IDs that are in the GEM dataset.  Is that allowed?  It seems that it should be as the NLG inputs are not really fully specified otherwise.  Alternatively, would it make sense to issue another update to this dataset?  I can share a script for extracting the service call method if that would be helpful.

Thanks
Mike

Sebastian Gehrmann

unread,
May 3, 2021, 10:24:33 AM5/3/21
to Michael White, gem-benchmark
Hi Mike, 

Thank you for all your suggestions! We are very unconstrained in what data is/is not allowed to be used as long as it is disclosed (and you are not training on the test set) - there is a field in the submission form that you can use to describe the process. 

However, I agree that maybe the missing information warrants another update to the data loader. I can look into making the update, but feel free to use your script in the meantime and please share it here in case it is useful for others. 

Best
Sebastian

--
You received this message because you are subscribed to the Google Groups "gem-benchmark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gem-benchmar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gem-benchmark/280c6c74-df09-4673-a656-3e15e225a170n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael White

unread,
May 3, 2021, 2:33:18 PM5/3/21
to gem-benchmark
Hi Sebastian

Thanks for the clarification and updates!  

Attached is the script.  It takes a path to the original SGD dataset as input and outputs a json file, which has the service call methods by partition, dialog id and turn, eg starting with

{"train": {"93_00000": {"3": "FindAttractions", "9": "SearchHotel", "21": "ReserveHotel"}, ...

indicating that the first dialog in the train set has three turns with the service call methods shown.  

With this additional info, it's possible to use the templates from the Kale & Rastogi paper.

Best
Mike
extract_service_call_methods.py
Reply all
Reply to author
Forward
0 new messages