Any thoughts on this?
high-level representation of the board convolutions
Do you have examples where the reality being modeled is not sequential? When you relate far positions across the board, are you thinking of some sequential scanning?
Or is the chess move sequence itself the object that would allow natural language machine learning framework, to be applicable. I have not finished surfing. but I have seen recurrent neural networks, perhaps looking as chess games as history of moves.... could bring something. but I am stuck at the "right away". more as I read more....
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/mxXNV41-DfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/85d898ca-f0a6-4b01-893a-65a90b024ecb%40googlegroups.com.
I'd suggest starting with the paper Attention is All You Need.
More relevant to this: AlphaStar and OpenAI's bots already use attention.
On Wed, Jun 3, 2020 at 8:48 PM DBg <dariou...@gmail.com> wrote:
Do you have examples where the reality being modeled is not sequential? When you relate far positions across the board, are you thinking of some sequential scanning?--
Or is the chess move sequence itself the object that would allow natural language machine learning framework, to be applicable. I have not finished surfing. but I have seen recurrent neural networks, perhaps looking as chess games as history of moves.... could bring something. but I am stuck at the "right away". more as I read more....
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/mxXNV41-DfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lcz...@googlegroups.com.
Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy of convolutional layers as a primary building block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis. Our code is publicly available
I am not sure that I need to answer the implementation questions, to see, that, it would not be a replacement of current architecture for chess, if I am right ....
fixed time sequences can't be exchanged by a re-ordering or the pixel in an image, with some adjustments to that input redefinition (but no transformation of information at input level).
fixed time sequences can't be exchanged by a re-ordering or the pixel in an image, with some adjustments to that input redefinition (but no transformation of information at input level).
Sorry. the can definitely be replaced by a fixed size sequence of pixels. So not a problem. the paper, is actually about that, for images. since lc0 input is motivated by board planes for each piece. The procedure of the paper is totally applicable.
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/mxXNV41-DfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/8b0ab3bd-53c5-484b-b00a-3e0092adc680o%40googlegroups.com.
The two most crucial requirements for a self-attention layer to express a convolution are:
- having multiple heads to attend to every pixel of a convolutional layer's receptive field,
- using relative positional encoding to ensure translation equivariance.
A key property of the self-attention model described above is that it is equivariant to reordering, that is, it gives the same output independently of how the input pixels are shuffled. This is problematic for cases we expect the order of input to matter.
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/mxXNV41-DfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/ac1914af-995a-43ba-af4b-65c76c981ef2o%40googlegroups.com.
as suggested for communication efficiency with equations?
We introduce a novel two-dimensional relative self-attention mechanism that proves competitive in replacing convolutions as a stand-alone computational primitive for image classification. We find in control experiments that the best results are obtained when combining both convolutions and self-attention.
We introduce a novel two-dimensional relative self-attention mechanism that proves competitive(Competitive gets translated to slightly worse than pure convolution, in the body of the paper).
in replacing convolutions as a stand-alone computational primitive for image classification. We find in control exper-
iments that the best results are obtained when combining both convolutions and self-attention.
While this work primarily focuses on content-based interactions to establish their virtue for vision
tasks, in the future, we hope to unify convolution and self-attention to best combine their unique
advantages.
Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and generative modeling tasks. In this paper, we consider the use of self-attention for discriminative visual tasks as an alternative to convolutions. We introduce a novel two-dimensional relative self-attention mechanism that proves competitive in replacing convolutions as a stand-alone computational primitive for image classification. We find in control experiments that the best results are obtained when combining both convolutions and self-attention. We therefore propose to augment convolutional operators with this self-attention mechanism by concatenating convolutional feature maps with a set of feature maps produced via self-attention. Extensive experiments show that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar. In particular, our method achieves a $1.3\%$ top-1 accuracy improvement on ImageNet classification over a ResNet50 baseline and outperforms other attention mechanisms for images such as Squeeze-and-Excitation. It also achieves an improvement of 1.4 mAP in COCO Object Detection on top of a RetinaNet baselinegan3sh500/attention-augmented-conv
Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer. A simple procedure of replacing all instances of spatial convolutions with a form of self-attention applied to ResNet model produces a fully self-attentional model that outperforms the baseline on ImageNet classification with 12% fewer FLOPS and 29% fewer parameters. On COCO object detection, a pure self-attention model matches the mAP of a baseline RetinaNet while having 39% fewer FLOPS and 34% fewer parameters. Detailed ablation studies demonstrate that self-attention is especially impactful when used in later layers. These results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox.JoeRoussy/adaptive-attention-in-cv
Also, here it seems that the convo stem was parallel to the attention one, the previous i think was serial. I have not looked into the detail to see how the attention layer was partitioned over the input vector (made multi piece, ok heads).
both seem to point to the need for both ingredients. I may still have some misconceptions. please point them to me, whoever reads this. But, i am mostly reporting on what those paper claim. very little interpretation about their results. which seem clear.