Hello everyone, I would like to participate in the challenge this year and I found it interesting to impose a model with up to 10B parameters so as not to favor participants with greater investment.
However, I'm trying to run the llama3 baseline on a macOS where I don't have a GPU. My machine has an Apple M1 Max and 32GB of RAM.
Regarding the quantization process, I set the variable (use_quantization) to false due to not having the GPU. After the first execution attempt I had a NotImplementedError error on PyThorch:
The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device.
Then, I exported the variable PYTORCH_ENABLE_MPS_FALLBACK with value 1. However, after trying to run the baseline again after more than an hour I had no feedback on the generation process. In fact, the progress bar is stuck at 0.
Does anyone have any tips on how to run the baseline in this settings?