Hi,
I have opened a bug report here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289813
Just to get a few more pointers, I'd like to ask you whether you are successfully
able to inference with "koboldcpp" and "llama.cpp" using an AMD GPU without
lock-ups?
To try quickly, you can checkout/build and bench quickly:
as root:
--------
pkg install gmake vulkan-loader opencl mesa-devel python
(attention: this installs 'mesa-devel' and remaps your current libGL and such. After
testing I suggest to remove 'mesa-devel' again as it gave me problems under Plasma6)
as user:
--------
vulkaninfo
(looks good?)
clinfo
(looks good, too?)
mkdir -p ~/work/src
cd ~/work/src
fetch -o MN-12B-Mag-Mell-R1.IQ4_XS.gguf '
https://huggingface.co/mradermacher/MN-12B-Mag-Mell-R1-GGUF/resolve/main/MN-12B-Mag-Mell-R1.IQ4_XS.gguf?download=true'
# koboldCpp
cd ~/work/src
git clone --depth 1
https://github.com/LostRuins/koboldcpp
cd koboldcpp
gmake -j16 LLAMA_CLBLAST=1 LLAMA_OPENBLAS=1 LLAMA_VULKAN=1 LDFLAGS="-L/usr/local/lib"
python koboldcpp.py --usevulkan --gpulayers 999 --benchmark --model ../MN-12B-Mag-Mell-R1.IQ4_XS.gguf
(do it a few times, your GPU may eventually lock up)
# llama.cpp
cd ~/work/src
git clone --depth 1
https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B .build -DGGML_VULKAN=1 -DGGML_OPENCL=1
cmake --build .build --parallel 16
.build/bin/llama-bench -m ../MN-12B-Mag-Mell-R1.IQ4_XS.gguf -ngl 100 -fa 0,1
(do it a few times, your GPU may eventually lock up)
Thanks for trying and for your feedbacks...
Regards,
Nils