Large Language Models limited by memory space

William Tanksley, Jr

unread,

Apr 22, 2023, 3:16:17 PM4/22/23

to Unum Computing

tl/dr: There might be a window of opportunity for Posits to demonstrate better performance in smaller memory as people are beginning to use large language models like GPT4All on PC/laptop/old home servers. Has anyone done work on this?

It's becoming common to train LLMs with GPUs, then requantize and run them GPUless -- except that while 8GB RAM is a reasonable size to expect in a home laptop, it's far too small to run even a 4.2GB 7m parameter language model - if it doesn't just crash, it'll swap, and therefore run painfully slow, generating about 6 words per minute. (In comparison, 16gb will run the same model quite pleasantly.)

Would it be possible (I wonder) for us to requantize an existing open-source model and run it using one of the existing software posit systems, possibly hooked up into one of the ML frameworks designed for that purpose? Is there any reason to expect that to do any better than the current tech being commonly used?

Soooo ... I lack the knowledge of the technologies, but does anyone here have the expertise to opine on whether this is actually interesting, or to suggest some things I'd need to learn to be of any use in this? I'm a programmer who's taken a few courses in modern AI techniques, almost all dealing with the underlying matrix math and basic neural networks.

MitchAlsup

unread,

Apr 22, 2023, 3:53:59 PM4/22/23

to Unum Computing

On Saturday, April 22, 2023 at 2:16:17 PM UTC-5 wtank...@gmail.com wrote:

tl/dr: There might be a window of opportunity for Posits to demonstrate better performance in smaller memory as people are beginning to use large language models like GPT4All on PC/laptop/old home servers.

The window is open and will not stay open very long. The window is open only for applications that can be re-typed easily--that is you can't take dusty deck FORTRAN code using COMMON blocks and say "poof this DOUBLE is now a REAL" because you then have to find all those intrinsics starting with D and change them into R {DINT( D ) -> RINT( R )} or the FORTRAN type matcher will have a hissy fit.

Today the problems are cresting 4GB, in another couple of years they will crest 8GB, a couple after that 16GB; ad infinitum.

{Dusty deck FORTRAN COMMON blocks are starting to go bigger than 4GB (in a single COMMON block)--as the easiest way to allow big FEM programs to "do their things" on larger models. There are several classes of such kinds of codes. (Also note: these kinds of COMMON blocks require efficient instruction set accesses to larger than 32-bit displacements).}

Posits may have a performance advantage that lasts essentially forever, because one can use 32-bit posits for lots of things one cannot use 32-bit IEEE-754's. {knowing up front which ones work and which ones fail is a job for testing and/or numerical analysis.} But here, the advantage is in the memory hierarchy and caching (maybe a touch in the TLB); not in (float) versus (double) as these are almost always the same clock counts in calculations.

jim.bra...@ieee.org

unread,

Apr 27, 2023, 8:02:23 PM4/27/23

to Unum Computing

The issue of sparseness is also at play? E.G. in an effort to trim weights one would go to a sparse representation?

(not knowledgeable on sparse representations, however would expect 32-bit displacements might cause issues)??

Reply all

Reply to author

Forward