> Seems that it is the same like transformer with depth and but the parameters are like "shared" to all the plain transformer layers.
> Sorry this seems like a novice question but correct me if I'm wrong?
No - you are perfectly right, this is the main idea! There are a few
more tweaks and the ACT part, but the main part is a depth-shared
> On Thursday, June 20, 2019 at 3:34:09 PM UTC+7, Sergio Ryan wrote:
>> So as far as I understand UT replaces the "pre-determined depth" in Transformer (that is 6, the suggested number of transformer body from the paper Attention is All You Need) with recursive connections. What is the difference between connecting plain transformer multiple time like in the paper, with the recursive connection other than the coordinate embedding and transition function, when ACT is not used to dynamically reduce the number of time steps?
>> From the Google AI blog:
>>> However, now the number of times this transformation is applied to each symbol (i.e. the number of recurrent steps) can either be manually set ahead of time (e.g. to some fixed number or to the input length), or it can be decided dynamically by the Universal Transformer itself. To achieve the latter, we added an adaptive computation mechanism...
>> I probably get it wrong. Can somebody clear things up for me?
> You received this message because you are subscribed to the Google Groups "tensor2tensor" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tensor2tenso...@googlegroups.com
> To post to this group, send email to tensor...@googlegroups.com
> To view this discussion on the web visit https://groups.google.com/d/msgid/tensor2tensor/0b27287e-477e-46ab-ac52-8de30f7c50d3%40googlegroups.com
> For more options, visit https://groups.google.com/d/optout