Hello Tunaberk,
I wasn't able to understand your question directly as I do not speak Turkish but here is a response based on what I understood from the translation, please let me know if I didn't address all your questions:
Essentially what the authors are doing is reformulating regular MLPs. Instead of flattening the input image of size [L, W] into a vector X of size [L*W], they are keeping the original shape as it is. In a regular MLP, we would multiply a weight matrix W of size [O, L*W], where O is the dimension of the output, by the input vector to get an output vector H of size [O]. Let's assume that the output is of the same size as the input [L*W]. The weight matrix in this case would also be [L*W, L*W].
In this scenario, element Hi in the output hidden vector is as follows (excluding bias terms):
Now, they change both the input and output to the shape of [L, W] instead of [L*W]. To (hopefully) make it easier to understand, I will split it into two steps. First, let's assume we did not flatten the image, and instead we simply kept it in the original dimension of [L, W]. The output hidden vector, H, which is still of size [L*W], can be expressed as follows:

Notice here that the weight matrix becomes 3 dimensional. In the first example, each input element can be identified by just its index (k) and similarly for the elements in the output hidden vector (i). The weight connecting these two elements can simply be denoted as Wi,k. However, when the input becomes two dimensional, we need two indices (k,l) to identify its position. Thus, the weight connecting input element at position (k,l) to output element at position (i) is now denoted as Wi,k,l. Now, instead of denoting the output as just a vector, we can have it as a matrix of the same size as the input. The output hidden representation matrix H is now of size [L, W] and can be expressed as follows:

In the next step, they simply reindex k as i+a and l as j+b to obtain the next formula.
Note that all of this is still a fully connected approach, i.e. every input element is connected to every output element. We are just changing or reshaping the inputs, outputs, and the weights without affecting the underlying computations. In the next steps in that chapter, they modify this representation slightly to enable translation invariance and thus convolutions.
Please let me know if anything is unclear.
Best,