Dear Samuel,
Thank you for the analysis and reply.
First point:-
I understood that the no of atoms are less than 1000, in order to get the performance i need to increase the atoms more than 1000.
Second point:-
I got generaty.py file at src/dbcsr_lib/cuda/libcusmm/generate.py location, is that the same file that you are referring.
I am having the same file in both the source codes that i have compiled with & without GPU so can you tell me which file i have to modify.
what are these triples and how did you get those values, can you please clarify this.
Thanks,
Narsimha.