llama3 on macOs

Marcelo Machado

unread,

Jul 8, 2024, 3:57:25 PM7/8/24

to lm-kbc2024

Hello everyone, I would like to participate in the challenge this year and I found it interesting to impose a model with up to 10B parameters so as not to favor participants with greater investment.

However, I'm trying to run the llama3 baseline on a macOS where I don't have a GPU. My machine has an Apple M1 Max and 32GB of RAM.

Regarding the quantization process, I set the variable (use_quantization) to false due to not having the GPU. After the first execution attempt I had a NotImplementedError error on PyThorch:

The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device.

Then, I exported the variable PYTORCH_ENABLE_MPS_FALLBACK with value 1. However, after trying to run the baseline again after more than an hour I had no feedback on the generation process. In fact, the progress bar is stuck at 0.

Does anyone have any tips on how to run the baseline in this settings?

Jan Kalo

unread,

Jul 9, 2024, 9:28:48 AM7/9/24

to lm-kbc2024

Dear Marcelo,

thank you for your interest in the competition.

Unfortunately, none of us has any experience in running LLMs on macOS. I tried it several times, but I always ran into some of these errors.

I think, there should be some workarounds and fixes, but I guess you already googled for them.

Maybe post an issue on the Huggingface Transformers GitHub page. I suppose that people there are a bit more familiar with these issues than we are.

Good luck!

Jan-Christoph

Simon Razniewski

unread,

Jul 9, 2024, 9:41:32 AM7/9/24

to lm-kbc2024

Another way to circumvent local issues could be a cloud-based service like Google Colab (in fact, this was also Jan's idea :))

Simon

Simon Razniewski

unread,

Jul 9, 2024, 10:23:19 AM7/9/24

to Marcelo Machado, lm-kbc2024

As long as you follow the spirit of the competition ("parameter limit"), where your code runs is of no concern, so cloud services are completely fine.

Cheers,

Simon

On Tue, 9 Jul 2024 at 16:16, Marcelo Machado <marcelo.o...@gmail.com> wrote:

Thank you for answering my question.
I found a way to avoid the warning by installing a newer version of PyTorch (not published in PyPI yet), however I am still stuck on 0 while trying to generate predictions. Anyway, I will try to run in a different setting.

But while we're on the subject, I'd like to know if my solution should use this implementation used to run the baselines. I mean, do I really need to run this on my machine? Or, can I build my system by accessing the model's REST API in Hugging Face (Serverless Inference API)? My results would still be reproducible.

At.te,

Marcelo de Oliveira Costa Machado

--
You received this message because you are subscribed to a topic in the Google Groups "lm-kbc2024" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lm-kbc2024/qHFdn5F1_CQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lm-kbc2024+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/lm-kbc2024/16546abe-550c-4d2f-aacf-0e352103f837n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Razniewski

unread,

Jul 9, 2024, 10:35:47 AM7/9/24

to Marcelo Machado, lm-kbc2024

That's OK. We believe in your integrity that you won't plug in something else :)

(and could in principle verify it by plugging in the model we used for the baseline runs..)

Cheers,

Simon

On Tue, 9 Jul 2024 at 16:31, Marcelo Machado <marcelo.o...@gmail.com> wrote:

Ok! But, I can access an instance of Llama3-8b deployed in a private cloud. Could this be used as well? I ask because this would restrict access to only authorized users.

At.te,

Marcelo de Oliveira Costa Machado

Marcelo Machado

unread,

Jul 10, 2024, 1:45:54 AM7/10/24

to Simon Razniewski, lm-kbc2024

Thank you for answering my question.

I found a way to avoid the warning by installing a newer version of PyTorch (not published in PyPI yet), however I am still stuck on 0 while trying to generate predictions. Anyway, I will try to run in a different setting.

But while we're on the subject, I'd like to know if my solution should use this implementation used to run the baselines. I mean, do I really need to run this on my machine? Or, can I build my system by accessing the model's REST API in Hugging Face (Serverless Inference API)? My results would still be reproducible.

At.te,

Marcelo de Oliveira Costa Machado

On Tue, Jul 9, 2024 at 10:41 AM Simon Razniewski <sraz...@gmail.com> wrote:

Marcelo Machado

unread,

Jul 10, 2024, 1:45:58 AM7/10/24

to Simon Razniewski, lm-kbc2024

Ok! But, I can access an instance of Llama3-8b deployed in a private cloud. Could this be used as well? I ask because this would restrict access to only authorized users.

At.te,

Marcelo de Oliveira Costa Machado

Marcelo Machado

unread,

Jul 10, 2024, 1:46:03 AM7/10/24

to Simon Razniewski, lm-kbc2024

That is nice and will save me a lot of time.

Thank you!

At.te,

Marcelo de Oliveira Costa Machado

Reply all

Reply to author

Forward