Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

question: <div> of all NBSP ZWNJ, why?

59 views
Skip to first unread message

Eli the Bearded

unread,
Mar 25, 2021, 7:39:02 PM3/25/21
to
I read HTML email that comes to me with the help of lynx. I've noticed
some messages have one or more <DIV>s filled with alternating
non-breaking white space / zero-width non-joiner. What's the point?

Example from a message today, still quoted-printable encoded, and
including the <P> before and after the <DIV> (and tracking pixel mucked
with):

<p style=3D"max-height: 0; font-size: 0; l=
ine-height: 0; margin: 0; overflow: hidden;">Assembly instructions, example=
code + more!</p><div style=3D"display: none; width: 0px; height: 0px; max-=
height: 0px; overflow: hidden;">&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C&nbsp;=E2=80=
=8C&nbsp;=E2=80=8C&nbsp;=E2=80=8C</div><p style=3D"max-height: 0; font-size=
: 0; line-height: 0; margin: 0; overflow: hidden;"><img alt=3D"" border=3D"=
0" src=3D"https://ned.soundestlink.com/transactional/track/abcdef0123456789=
123b4186?signature=3Dabcdef0123456789e3d61b0a713e17907cf67babcdef0123456789=
57737ef591" width=3D"1" height=3D"1" /></p>

Note that =E2=80=8C is the octet-by-octet quoted-printable version of
the U+200C codepoint UTF-8 encoded.

(Lynx does not honor "display: none" and these weird blocks of
whitespace show up in various post-lynx operations, like quoted plain text
replies.)

ZWNJ is intended to suppress ligature output when it might otherwise be
used by a naive typesetter. I'm unclear what effect is expected when
joining whitespace to whitespace.

Elijah
------
using &zwnj; would be shorter than the QP version

David E. Ross

unread,
Mar 26, 2021, 2:36:46 AM3/26/21
to
Unfortunately, E-mail applications often generate very poor HTML. My
most recent analysis was done almost two years ago. At that time, I
found that HTML-formatted messages contain an average of 7.3 HTML syntax
errors per KB of file size. I conducted similar analyses in 2008, 2010,
2012, and 2015 and found little improvement. Actually, my 2019 analysis
showed a greater number of errors than then 2015 analysis. In my
analyses, I did not address how the HTML errors created by a particular
E-mail application affect a different E-mail application.

My analyses also addressed bloat, which is the increase in the size of
an HTML-formatted message to convey the same textual content as a
plain-text message. Bloat can result from such things as (1) what you
are seeing, (2) 2-part messages that contain both plain-text and
HTML-formatting, and (3) unnecessary but non-erroneous HTML markups.
Bloat factor is a measure of bloat, computed by dividing the size of the
HTML-formatted message by the size of the plain-text message that has
the same content. In 2019, I found the average bloat factor for
HTML-formatted messages was 16.0 times the size of the equivalent
plain-text content. This was a larger bloat factor than in any of the
four prior analyses. That is, newer E-mail applications are producing
worse bloat than older applications.

A more detailed report of my 2019 analysis is at
<http://www.rossde.com/internet/ASCIIvsHTML.html>.

--

David E. Ross
<http://www.rossde.com/>.

The only reason we have so many laws is that not enough people will do
the right thing. (© 1997 by David Ross)

Phillip Helbig (undress to reply)

unread,
Mar 26, 2021, 5:21:36 AM3/26/21
to
In article <eli$21032...@qaz.wtf>, Eli the Bearded
<*@eli.users.panix.com> writes:

> I read HTML email that comes to me with the help of lynx. I've noticed
> some messages have one or more <DIV>s filled with alternating
> non-breaking white space / zero-width non-joiner. What's the point?

Short answer: the people sending the message don't understand the
difference between content and presentation.

> using &zwnj; would be shorter than the QP version

Yes, but brevity is the soul of wit, not of modern email.

Jukka K. Korpela

unread,
Mar 26, 2021, 8:02:59 AM3/26/21
to
Eli the Bearded wrote:

> I read HTML email that comes to me with the help of lynx. I've noticed
> some messages have one or more <DIV>s filled with alternating
> non-breaking white space / zero-width non-joiner. What's the point?

Without the styling, my guess would be: creation of an element with no
visible content but some forced width, with the assumption that without
intervening ZWNJ, a rendering engine might treat a sequence of NBSP as
collapsible white space (an odd assumption, but some renderers do odd
things).

With the styling that makes the element non-rendered, zero width, zero
height, I guess that guess was wrong. Or maybe they intentionally want
non-CSS rendering differ from CSS-enabled rendering.


Lewis

unread,
Mar 26, 2021, 8:21:32 AM3/26/21
to
In message <s3jvdr$1ueq$1...@gioia.aioe.org> David E. Ross <not_me@not_there.invalid> wrote:
> That is, newer E-mail applications are producing worse bloat than
> older applications.

IME most HTML email is not created within an email application at all,
it is created elsewhere and them imply spammed out through email.

It's a shame that email lists allow HTML messages at all as so many are
so badly formatted and email clients tend to lack the tools to deal with
bad HTML

--
A golem of Margo? A Margolem.

Eli the Bearded

unread,
Mar 26, 2021, 2:33:18 PM3/26/21
to
In comp.infosystems.www.authoring.html,
Phillip Helbig (undress to reply) <hel...@asclothestro.multivax.de> wrote:

The rare nested comments format of From: line...

> Eli the Bearded <*@eli.users.panix.com> writes:
>> I read HTML email that comes to me with the help of lynx. I've noticed
>> some messages have one or more <DIV>s filled with alternating
>> non-breaking white space / zero-width non-joiner. What's the point?
> Short answer: the people sending the message don't understand the
> difference between content and presentation.

Yeah, well, if they understood (or cared) I expect they'd have a
text/plain version, too, so I could ignore the text/html one.

>> using &zwnj; would be shorter than the QP version
> Yes, but brevity is the soul of wit, not of modern email.

It is amusing, in a sad way, that both Twitter and Mastodon send me
email notifications that are each more than 40x (and close to 50x) times
larger than the maximum message length in their services.

Thanks everyone for the speculation about what's going on. It sounding
like is not some well known trick but just naive blundering.

Elijah
------
or cargo-cult blundering
0 new messages