Speed or space

David Newall

unread,

Feb 8, 2022, 9:32:43 AM2/8/22

to

Still working on utf8show and have classic space vs. time tradeoff.

I can store the UnicodeEncoding map in a Dictionary, which is blindingly
fast, or in a sparse array, which is more compact but slower to access.
Every glyph painted needs a lookup in the map.

---Bytes Used-- --Savings-- -Time(ms)-- S:D ms
Map Source Dict Sparse Bytes % Dict Sparse Ratio
AdobeGlyphList(1) 222472 180798 41686 19% 161 2968 18.45
FreeSerif.gn2(2) 442320 356006 86314 20% 604 2856 4.73
UnifontMedium.gn2(3) 3357832 2173806 1184026 35% 156 949 6.08

(1) AdobeGlyphList from ghostscript v9.50 on Ubuntu 20.04.3.

(2) Generated by fontforge from FreeSerif.sfd extracted from
freefont-src-20120503.tar.gz from ftp.gnu.org/gnu/freefont.

(3) Generated by fontforge from unifont-14.0.01.ttf extracted
from unifont-14.0.01.tar.gz from ftp.gnu.org/gnu/unifont/.

Tests were performed using Ghostscript 9.5 on Ubuntu 20.04.3 with
Intel(R) Core(TM) i7-1165G7 @ 2.80GHz. Glyphs for UNICODE values 0
through 150,000 were retrieved from each map five times.

I don't know why accessing the sparse Adobe map is so much slower than
the other two. Average retrieval time is under 4ns; for UnifontMedium
it's under 1.3ns.

I'm going to implement it one way only. Which should it be?

John Reiser

unread,

Feb 9, 2022, 12:59:24 PM2/9/22

to

On 2/8/22 06:32, David Newall wrote:
> Still working on utf8show and have classic space vs. time tradeoff.

[[snip]]

> I'm going to implement it one way only. Which should it be?

Small space is more important than blindingly-fast. Think of containers
in the cloud, where stingy RAM allocations abound and demand paging
is horrible or impossible.

Carlos

unread,

Feb 10, 2022, 9:28:13 AM2/10/22

to

On Wed, 9 Feb 2022 01:32:35 +1100
David Newall <dav...@davidnewall.com> wrote:

> Still working on utf8show and have classic space vs. time tradeoff.
>
> I can store the UnicodeEncoding map in a Dictionary, which is
> blindingly fast, or in a sparse array, which is more compact but
> slower to access. Every glyph painted needs a lookup in the map.

I don't know what a sparse array is in this context. Is it like the
input array to unicodefont? In that case maybe it could be optimized,
by, idk, putting code points in entries 0, 2, 4, 6, ... to speed up
lookup, and the corresponding arrays with glyph names (or just the main
glyph name) in entries 1, 3, 5, 7, ..., or something like that.

>
> ---Bytes Used-- --Savings-- -Time(ms)-- S:D ms
> Map Source Dict Sparse Bytes % Dict Sparse Ratio
> AdobeGlyphList(1) 222472 180798 41686 19% 161 2968 18.45
> FreeSerif.gn2(2) 442320 356006 86314 20% 604 2856 4.73
> UnifontMedium.gn2(3) 3357832 2173806 1184026 35% 156 949 6.08
>
> (1) AdobeGlyphList from ghostscript v9.50 on Ubuntu 20.04.3.
>
> (2) Generated by fontforge from FreeSerif.sfd extracted from
> freefont-src-20120503.tar.gz from ftp.gnu.org/gnu/freefont.
>
> (3) Generated by fontforge from unifont-14.0.01.ttf extracted
> from unifont-14.0.01.tar.gz from ftp.gnu.org/gnu/unifont/.
>
> Tests were performed using Ghostscript 9.5 on Ubuntu 20.04.3 with
> Intel(R) Core(TM) i7-1165G7 @ 2.80GHz. Glyphs for UNICODE values 0
> through 150,000 were retrieved from each map five times.
>
> I don't know why accessing the sparse Adobe map is so much slower than
> the other two. Average retrieval time is under 4ns; for UnifontMedium
> it's under 1.3ns.
>
> I'm going to implement it one way only. Which should it be?

Contrary to John's opinion, I don't think space matters much,
especially if it's only 3 MB. So I'd choose speed. But then, I don't
know anything about Postscript printers or other low capacity devices.

C.

David Newall

unread,

Feb 15, 2022, 10:30:12 PM2/15/22

to Carlos

On 11/2/22 01:28, Carlos wrote:
> I don't know what a sparse array is in this context.

Thanks for your feedback and thoughts.

I deliberately chose a term that suggests the implementation without
pinning it down so that I could ask about speed vs space without getting
bogged down over how it was implemented.

Here's how I implemented it:

A "sparse" array contains values indexed by integers, and efficiently
permits missing values. Each element in the array is a sub-array
containing the index of the first value followed by one or more values.

For example: [[8 v8 v9 v10] [14 v14 v15] [37 v37]] is a sparse array
containing values at index 8, 9, 10, 14, 15 and 37.

The sparseget procedure retrieves an element from a sparse array. If
the array contains a value at that index, the value and true is pushed
on the stack, otherwise false is pushed.

I compared normal versus packed arrays and found that packing saved
negligible space while substantially increasing access time, so sparse
arrays are not packed.

In terms of performance versus a dictionary (for UnicodeEncoding map)
accessing a sparse array takes around 5 times longer, which sounds awful
but (on my computer) that's sub-1ns access versus 4ns, so I don't think
the speed is an issue.

One inexplicable result is that the sparse array generated from the
built-in AdobeGlyphList (Ghostscript 9.50 on Ubuntu 20.04 with Intel
i7-1165G7 @ 2.80Mhz) takes 25 times as long as the equivalent
dictionary. I don't understand why so much slower compared with maps
from fonts. Clutching at straws, I initially thought it might be
because AdobeGlyphList is built-in and maps from (non-builtin) fonts
aren't, but that can't be the case. For benchmarks, I load the
dictionary and sparse arrays from external files so, although derived
from the built-in dictionary, they are unlikely to be stored in common.
I also tried changing the names (added "xx" to them) and the result
stands. It's a mystery to me.

I'm inclined to go for space over speed but am willing to listen to
reason if people feel strongly that speed is more important.