If you just want to run LLMs, I would suggest putting your money into a desktop and then remoting into it from an inexpensive laptop.
Depending on what you want to do you will be bottlenecked either by VRAM or RAM (or both). GPUs run parallel tasks like inference (LLMs) and diffusion (image generation) much faster than GPUs, but the model size is generally best limited to the amount of VRAM in said GPU(s).
On the other hand you can run larger models in RAM, even without any GPU, but the processing per token will be much slower as it will be CPU bound. You can split a model across RAM and VRAM, but it’ll suffer from having to swap data between them.
My advice: rent some servers from something like Runpod to test out what RAM and GPU you can afford / want / need across different models. It’s cheap and would be essentially the same experience as running it locally on a desktop or laptop. Then if you still want a local device you’ll know the specs you’ll need.
—Matt