tl/dr: There might be a window of opportunity for Posits to demonstrate better performance in smaller memory as people are beginning to use large language models like GPT4All on PC/laptop/old home servers. Has anyone done work on this?
It's becoming common to train LLMs with GPUs, then requantize and run them GPUless -- except that while 8GB RAM is a reasonable size to expect in a home laptop, it's far too small to run even a 4.2GB 7m parameter language model - if it doesn't just crash, it'll swap, and therefore run painfully slow, generating about 6 words per minute. (In comparison, 16gb will run the same model quite pleasantly.)
Would it be possible (I wonder) for us to requantize an existing open-source model and run it using one of the existing software posit systems, possibly hooked up into one of the ML frameworks designed for that purpose? Is there any reason to expect that to do any better than the current tech being commonly used?
Soooo ... I lack the knowledge of the technologies, but does anyone here have the expertise to opine on whether this is actually interesting, or to suggest some things I'd need to learn to be of any use in this? I'm a programmer who's taken a few courses in modern AI techniques, almost all dealing with the underlying matrix math and basic neural networks.