Hi Stephane
As you know, I am trying to see if there is any way to optimize the speed of the code. After some time looking at it, I would like to report some observations and to know your comments about this.
I attach a patch file (you can find 4 patches inside) that contains the modification of the code I did for some initial trials. It is not a final code at all, so please, pull these patches in a separate folder and do not pay too much attention to the program style (I think it is clear enough to be understood easily but if it is not, please tell me)
It is a little difficult to explain what I did in a short email, but I am going to try to do it. Basically, the final goal is to know if there is a way to traverse the cells quicker, and to find out what is making the cell traverse slow in some situations.
As you will see in the patch, I have created new structures to make the cell traverse quicker. I know perfectly that this does not mean that the method I propose is better, but it is a good test for comparison between cell_traverse functions and the best one could do
The basic idea is to create a structure (MatrixCells) that contains the following important elements:
FttCell ** cellpointer : it is an array of pointers. Each element contains a pointer to a cell in domain.
gint ** lists: is a matrix that contains the number of elements in the array cellpointer of a given traverse list. Note that depending of the celltraverse operation, a different list is defined. For example list[0][i] contains the elements of a FTT_CELL_TRAVERSE_LEAFS in cellpointer. I add -1 to mark the list end (this structure may be optimized with a dynamic array)
gint * neighbors: It is an array with FTT_CELL_NEIGHBOR*TOTALCELLS elements where I save the positions of neighboring cells in the cellpointer structure
gdouble * facevalues: It is an array with FTT_CELL_NEIGHBOR*TOTALCELLS elements where I directly save a given value at the faces of a given cell
This structure is complemented with another one for the variables (variableaux) that basically contains an array of real values. The ith element corresponds to the value of the variable in the cell located at the same position than cellpointer[i]. One could imagine to just make a list of pointers to the value of a GfsVariable in the ith cell.
The initialization of the structures is code at the beginning of mac_projection. Once the structures are initialized I have tried to compare the velocity of a given operation.
The speed up in very simple operations is impressive. For the example I attach below, the scale_divergence takes (just apply the patches and do gerris2D skew.gfs):
scale standard 0.26 s
scale opt 0.01 s
However, I do not always get such a good improvement. In addition, I did not really understand why I was getting such a large difference.
The gfs_normal_divergence function is much more illustrative of the speed optimization problems. I just realized that it penalizes a lot to use GFS_STATE (cell)->f[face.d].un or gfs_domain_face_fraction. but only when I use my optimized structure. Indeed, even if I save the values in the FttCell structure, it takes some appreciable time to access to them. This was the reason why I created direct structure to access to the values I need directly (obviously at expenses of memory). The results I got are the following:
gfs_normal_divergence (standard) 0.35 s
gfs_normal_divergence_opt 0.01 s(completely optimized)
gfs_normal_divergence_opt 2.04 s(!!! very funny result obtained with the optimized traverse and using a stored value in the FttCell structure)
gfs_normal_divergence_opt 0.28 s(optimized traverse and I use gfs_domain_face_fraction)
Something similar happens with GFS_STATE (cell)->f[face.d].un
Now the questions:
1) Do you know exactly why this happens?
2) Based on these observations, do you think there is a way to optimize the ftt_cell_traverse without using my structure, but making the compiler recognize the memory that is going to be used next?
3) Do you think we could store the MatrixCells structure for each box and to use it to traverse the cells? if so, do you think it is worthy to pre-compute and save all the information we expect to be used?
4) Indeed, related to 3 and as you know already, I have verified that what it would make a huge different in the computational time would be to store the matrix coefficients required to compute the gradients. This is somehow related to the suggestion of pre-computing required quantities but it does not necessary require to change the cell_traverse. I would appreciate if you could give me some information about the different information I get from gfslinearproblem. I guess I only need the GfsStencil but to be honest, I would be glad if you could give me some information about how this information is structured
best
Daniel
--
www.danielfuster.com