Tips to process area with many transitions and features

73 views
Skip to first unread message

Nildson Rodrigues

unread,
Oct 3, 2024, 10:24:48 AM10/3/24
to Dinamica EGO
Hi,

I am trying to run Weights of Evidence Correlation in one of my study PhD areas. In my study area I have 43 transitions and 60 features. I can process the model in an area with 9 millions of hectares (Mha) in 2 days, but now I am needing to process this model in an area with 32 Mha. However, my model is running for 10 days and does not finish the first transition yet.

To obtain the model in an area of 9 Mha I was using the balanced configuration with 224 processors, and now the same. Someone knows some thing that I could decrease the time of generation of the results of my area with 32 Mha?

I was thinking to add the following setup:

-memory-allocation-policy 4
-disable-map-swapping

Thanks in advance,

Hermann Oliveira Rodrigues

unread,
Oct 3, 2024, 1:46:34 PM10/3/24
to Dinamica EGO on behalf of nildso...@gmail.com
Hi,

Which Dinamica version are you using? And which hardware are you using to run your model (CPU, number of cores, amount of memory and type of hard drive are you using)? And also, what is the size and format of your dataset: number of lines and columns, number of layers, file format (tiff, img?)?

Using "-memory-allocation-policy 4" forces dinamica to create temporary maps in memory and keep the input maps given to LoadMap/LoadCategoricalMap in disk. The "-disable-map-swapping" flag does not have any effect anymore in more recent versions of Dinamica.

Usually "-memory-allocation-policy 4" does increase performance but it only works if you have enough memory to avoid creating temporary files on disk.

Best,

hermann

----------------------------------------------------------------------
Hermann Rodrigues
hermann....@gmail.com
her...@csr.ufmg.br
Centro de Sensoriamento Remoto / UFMG
https://csr.ufmg.br | https://dinamicaego.com


--
You received this message because you are subscribed to the Google Groups "Dinamica EGO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dinamica-ego...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dinamica-ego/8bfb07a3-60b0-41c2-a94c-e98f5fcec592n%40googlegroups.com.

Nildson Rodrigues

unread,
Oct 3, 2024, 2:07:24 PM10/3/24
to Dinamica EGO
Hi Hermann,

I'm using a server from the Institution, unfortunatey, I dont know my CPUs model, but the PC has 226 cores (I am using 224), 2 TB, 512 GB of RAM, and the hard drive also I dont know (I think it is a SSD).  About my dataset:

Numbe of lines: 22748
Number of columns: 18967
Number of layers: 60 (size: 1.8 GB)
File format: tiff

Thanks in advance,
Nildson

Nildson Rodrigues

unread,
Oct 3, 2024, 2:09:16 PM10/3/24
to Dinamica EGO
Also I am using DinamicaEGO Version 7 (the last before the 8).

Hermann Oliveira Rodrigues

unread,
Oct 8, 2024, 2:53:13 PM10/8/24
to Dinamica EGO on behalf of nildso...@gmail.com
Hi,

There are basically two aspects of the problem that you need to tackle. One is easy: just change the memory management policy to 4 (aggressive) and that might solve some of your problems. You can also use the prefer-memory policy, given that you seem to have enough memory to hold all the maps in memory.

The other aspect of the problem is more complicated. Depending on how the computer cores are organized, accessing certain portions of memory by a thread running on one core might require going through another core, and this is usually slow. This is usually the case when the computer cores are organized as a set of NUMA nodes. To work around this problem, it is usually recommended to use a number of workers (if you are using the command line, the corresponding flag is called -processors) smaller than the number of cores available in the system and, in some extreme cases, equal to the number of cores in a single NUMA node.

I believe this might help improve your calculation speed significantly.

Best,

hermann

----------------------------------------------------------------------
Hermann Rodrigues
hermann....@gmail.com
her...@csr.ufmg.br
Twitter: @horodrigues | @dinamica_ego
Centro de Sensoriamento Remoto / UFMG
https://csr.ufmg.br | https://dinamicaego.com

Nildson Rodrigues

unread,
Oct 8, 2024, 7:44:28 PM10/8/24
to Dinamica EGO

Thanks Hermann.

In my computer, I have the following information about NUMA node:

NUMA node(s): 2
NUMA node0 CPU(s): 0-63,128-191
NUMA node1 CPU(s): 64-127,192-255

In this case, can I only process one task in 63 processors?
Best,
Nildson

Hermann Oliveira Rodrigues

unread,
Oct 9, 2024, 11:58:53 AM10/9/24
to Dinamica EGO on behalf of nildso...@gmail.com
Sadly, it is not a question of what we can do, but what we should do. You can use as many cores as you want, but because they are arranged as NUMA and because Dinamica is not NUMA-aware (and assumes that all cores are equal for all tasks – this is important), you might end up with poor performance because one core is the one closer to where the memory for some task was allocated, but the cores in fact doing the computation and using that memory effectively are all located in a different NUMA node. Typically, the OS simply migrates the thread/workers to the cores closer to where the memory was allocated, but when we have lots of threads/workers taking part in the calculation, core migration becomes less feasible, and performance degrades.

So, usually, it helps to reduce the number of cores to prevent this problem. But this is not a silver bullet. It heavily depends on the model and on the layout of the data being used.

So my rule of thumb is: run a portion of your model with all the available cores, see how it goes, reduce the number of cores, and see if that improves performance or causes performance degradation. Once you have a verdict, run the entire model with the resulting number of cores.

Hope that helps!

Best,

hermann

----------------------------------------------------------------------
Hermann Rodrigues
hermann....@gmail.com
her...@csr.ufmg.br
Twitter: @horodrigues | @dinamica_ego
Centro de Sensoriamento Remoto / UFMG
https://csr.ufmg.br | https://dinamicaego.com

Message has been deleted

Nildson Rodrigues

unread,
Nov 14, 2024, 7:59:22 PM11/14/24
to Dinamica EGO
Prezado Hermann,

Obrigado pelo retorno. Apliquei as suas sugestões e obtive bons resultados, contudo como ainda tenho muitas calibrações, estou tentando encontrar outras formas de otimizar o tempo de obtenção dos resultados do Passo 4 (Weights of Evidence Correlation). Uma alternativa que pensei foi deletar todas as linhas do arquivo de pesos que apresenta o valor 0. Segue um exemplo:

From* | To* |  Variable*                               |  Range_Lower_Limit*   |  Weight
100     | 102 |  distance/distance_to_102 | 0                                      |  3.83166277058803
100     | 102 |  distance/distance_to_102 | 30                                    |  
2.25307765794551
100     | 102 |  distance/distance_to_102 | 2851                              |  0
100     | 102 |  distance/distance_to_102 | 4771                              |  0

Você recomenda isso?

Desde já agradeço e fico à disposição para qualquer esclarecimento,
Atenciosamente,
Nildson
Reply all
Reply to author
Forward
0 new messages