LLM and WebGPU. What is the proper way to set it up?

96 views
Skip to first unread message

Robert Lukoshko

unread,
Nov 1, 2023, 11:06:44 AM11/1/23
to Chromium Extensions
Hey! 
I built the open-source Chrome extension fuzzy search, chatGPT with page.
Currently, we have to connect to a locally running process with the llama2 model.
I was doing my research during the last few days on how to set up offscreen and web workers to use the llama 7B model with the power of web GPU

But unfortunately, I don't have too much expertise in the field. So I don't understand properly how to set up a good architecture which satisfies those criteria:
1) LLM model (let's say around 4GB) is loaded only once from the internet and is saved on the computer/browser
2) Workers/offscreen scripts can run the LLM processes and they have access to web GPU and perform computation efficiently

I really want to push forward the direction of having open source personal LLM models for everyone. And in future, also fine-tune it.

So I ask for help! 
If anyone has a good example or knowledge of how to make it work, any resources to to set up architecture properly and efficiently, then I kindly ask to share!

Thanks for your time
Robert
Reply all
Reply to author
Forward
0 new messages