why SDCH is used only in searches?

147 views
Skip to first unread message

shimrit

unread,
Jun 28, 2011, 2:18:55 PM6/28/11
to SDCH
Hi,

It seems that SDCH has a lot of advantages over other compressions,
like gzip.
So why google uses it only on searches? why not on other google
traffic, like google sites etc.

And another question, efficient compression might be even more
important on cellular devices (to save bandwidth, CPU and battery), is
it used in Android? if not, why not?

Thakns,
Shimrit

Jim R

unread,
Jun 30, 2011, 12:44:17 AM6/30/11
to SDCH
I can't comment on why a given vendor uses SDCH for traffic, but I can
comment more generally on SDCH.

SDCH is not a generic replacement for gzip, and in fact it works in
conjunction with gzip in its most common encoding format. SDCH makes
use of a "shared dictionary" that is specifically selected to contain
strings that are likely to be found in a transmission. As a result, a
dictionary is generally "tuned" for (created for?) a specific set of
content. There is no such thing as a "universal" dictionary, as it
would have soooo many strings (in the dictionary), that the
specification of a string would require a ton of bytes (on average, as
many bytes as the string!)... and then there would be no compression
achieved. The compression is only achieved IF the dictionary is short
(so a selection from the dictionary is compact) but yet many/long
strings are commonly found in the dictionary.

It may well be that SDCH is more broadly applicable, but in each case,
some training data has to be used to generate a dictionary, and then
the effectiveness of that dictionary needs to be tested against a
second corpus to be sure that "sufficient" compression is achieved.

Several vendors are reportedly experimenting with the use of SDCH, so
you may see broader use in the future.

Again, I can't comment on why a specific phone, such as Android, might
use or not use SDCH, but I can comment on elements that are good to
understand in a tradeoff of usage.

BANDWIDTH TRADEOFF: dictionary now cost vs content reduction cost.

SDCH requires transmission of a dictionary to a client. There is a
bandwidth trade off between the size of the dictionary download, and
the amount of compression (that reduces future download sizes). If a
client gets enough byte savings, then SDCH would (on this basis) be a
no-brainer. It is (for this criteria) critical to compare how long
the dictionary can be used, and how much it saves, against its
download (upfront) cost.

LATENCY TRADE-OFF: faster downloads vs bandwidth cost (if the
bandwidth trade-off is losing proposition)

From a pure latency perspective, it is possible to use marginal
bandwidth for a dictionary download (when the user isn't waiting), and
then achieve a latency win by reducing serialization latency (due to
byte count reduction) when the user has requested and is waiting for
content. Such a trade off is much clearer when marginal background
bandwidth is free (cheap?), but with cellular providers, bandwidth is
almost always tight, and almost expensive. Broadband users might
appreciate this trade off,... but cellular providers might not see the
value side as being that significant. On the other hand, some
cellular connections may be significantly bandwidth limited, and
reducing size (and hence serialization latency) may be a BIG benefit
(at least for the user).

MEMORY COST (static RAM, or "disk cache" costs)

From a memory use in a (browser?) client perspective, SDCH increases
the amount of memory that is needed, client side. In the case of the
Chrome implementation (for example), SDCH dictionaries are actually
cached in memory so that disk latency plays no role in content
decompression. (This is also backed by a disk cache, but that is of no
consequence to latency). If a client has a lot of dynamic RAM, then
there is little problem with holding these dictionaries in memory. In
a cellular device, where memory is tight, the cost of holding the
dictionaries in memory, or even holding them in "disk cache," may be
substantial.

Then again, "disk access" latency needs to be contrasted with "network
latency," if the design decisions of Chrome are revisited. An
application *could* store dictionaries on disk (perhaps flash memory
in a cellular device?). On a high bandwidth, high-power desktop, a
read from disk may be relatively expensive operation in terms of
latency. On a narrow bandwidth network, disk access time (to load a
dictionary on demand) may prove small compared with large resulting
reductions in serialization latency.

BOTTOM LINE: You probably need to experiment with an application,
perchance with a device, and perchance with a network setting. You
need to see how much redundancy appears between generic downloads,
which can be pulled out into a shared dictionary. The dictionary size
(download cost) has to be contrasted with its effectiveness. The cost
of holding the dictionary in the client needs to be considered (as
does the cost of compression and decompression at each end of the
channel).

If I were a betting man, I'd wager that there will be a number of
places where SDCH proves quite valuable. As you noted, at least one
vendor is using it for searches.

Jim

Shimrit Tzur

unread,
Jul 3, 2011, 2:37:37 PM7/3/11
to sd...@googlegroups.com
Thank you very much for the detailed answer!
Shimrit

--
You received this message because you are subscribed to the Google Groups "SDCH" group.
To post to this group, send email to SD...@googlegroups.com.
To unsubscribe from this group, send email to SDCH+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/SDCH?hl=en.


Reply all
Reply to author
Forward
0 new messages