I’ve run a few of the well known models on my Apple M2 powered Mac mini. I can get better then real-time performance.
An M2-Pro with 16GB RAM seems to have performance about like a mid range Nvidia GPU. But it cost maybe less and certainly uses a lot less power.
I would say that running a capable LLM on local and afordable hardware is already a solved problem.
What I’ve not yet worked out is how my “hello world” robot would work. My test case is that I can say “Robbie pick up the green cube and place it in the cup.” and then it does as told.
The missing link (for me) is the connection between the output of the LLM and a conventional motion planning system like MoveIt.
As for how to best run the LLM on the M2-based Mini, you need to Googler “lama.cpp”. This software will run most models on common hardware. 3-billion parameter models are good enough for any kind of conversation you might do with a domestic robot.