I could not find a way to reproduce the current inconvenience.
Currently, considering my RAM restrictions I was able to build and run the application by using a 1000 iterations loop.
If my investigation is correct, ArrayFire operates with a lazy evaluation model, meaning operations are queued rather than executed immediately.
In a multithreaded context as yours, without synchronization, memory may not be freed promptly because the computation might still be in progress when the thread terminates.
Would you kindly consider to use af::sync(); as the last line of your MatMul function? I could not give a firm answer, as I could not reproduce the inconvenience under my computer, even if compiling the same application with 1000 iterations.
Hello Johnny!
```
auto MatMul() -> void {
float h_A[] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
float h_B[] = {1, 2, 3};
const af::array A(3, 3, h_A);
const af::array B(3, 1, h_B);
af::array X1 = af::matmul(A, B);
af::sync();
}
```