Hi,
I would like to share the latest bigdl-llm, a library for running LLM (language language model) on your local laptop using INT4 with very low latency (for any Hugging Face Transformers model). It is built on top of the excellent work of llama.cpp, gptq, bitsandbytes, etc; see the demos on a 12th Gen Intel Core CPU below.
phoenix-inst-chat-7b ![]() | vicuna-13b-v1.1 ![]() | starcoder-15b ![]() |
Thanks,
-Jason