I want to update my information: the 8k was a guess. I think that the 8k was at best a conservative guess of the needed code size. It *may* be just enough to fit the codec and an interface to the OS. I think that the WarpSpeed format would be better: I get 16k of code space; it only has to be active when used; it could have a read-out of the ratio; it might even allow some fast-load code. The Magic Desk format would do okay by giving 32k, but it would take extra work to switch banks. Of course, I think that the estimate given was based on a cc65 test coder with heavy assembler/zp, a minimized crt0.s file and usage of a simplified text display code (my cbmsimpio library for cc65). I think that removing callmain would save a little because, even when I remove the call in the crt0.s file, I don't cut any bytes from the codec's size. The reason why I keep asking for more optimization techniques is so I can fit the codec in an 8k cartridge--and I didn't know the variety of c64 cartridges at the time. Besides that, if I cut the executable and/or bss data usage by 1k, I have 1k more room for the compression.
BTW, as of now, I don't have a working codec, and when I do, I'll need to be able to interface with the kernel upon startup and override the default LOAD/SAVE routines.