Single vs. Double Byte Characters - "en" vs. "em"?

752 views
Skip to first unread message

Warren Smith

unread,
Oct 15, 2015, 11:04:44 AM10/15/15
to hon...@googlegroups.com

I found the following entry in the Weblio dictionary while looking for something else:

 

----

そして、半角スペースを印字して、全角の「時」を印字する。

 

Then an en space is printed to print out em 'hour'. - 特許庁

----

 

 I am unfamiliar with the words "en" and "em". Are these in common usage for "single byte" and "double byte"?

 
Perhaps my Googling skills are inadequate, but I can find nothing on the web justifying this usage.... I do find the following defintion:
 
----
en: a unit of measurement equal to half an em and approximately the average width of typeset characters, used especially for estimating the total amount of space a text will require.
-----
 
This however seems to be referring to the space required for printing/displaying the character, not the data width in encoding of the character (which I normally associate with 半角 and 全角). Is the translation incorrect, or do 半角 and 全角 also refer to en and em?
 
Warren
 

Dan Lucas

unread,
Oct 15, 2015, 11:17:16 AM10/15/15
to hon...@googlegroups.com
Warren - I have never seen "en" and "em" used as quasi-synonyms for single or double byte.
 
I think the first time I encountered these words was back in the late 1980s, when I read Donald Knuth's discussion of the difference between the en-dash and em-dash (and hyphen, and minus sign) in the first few pages of the TeX Book. Those two dashes constitute the most common usage, probably the only common usage.
 
I kind of admire the translator's chutzpah in adopting a typographical unit from a totally different writing system to try and convey the 半角 and 全角 of Japanese orthography but I don't think "en" and "em" work. For one thing, as you found, they're just too obscure to make sense to most people.
 
Regards
Dan Lucas
 
--
Tel: 44-1239-460-789
Fax: 44-1239-460-840
Terms and conditions for all work as per standard ITI T&C
--
You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
 

Richard VanHouten

unread,
Oct 15, 2015, 11:17:35 AM10/15/15
to hon...@googlegroups.com
I'd say "half-width" and "full-width", as indeed en and em are referring to printing width, and have nothing to do with the number of bytes used to store the characters. I do wonder if 時 is a henkan for 字: "And then, a half width space is printed, followed by a full width character." This would make sense if an odd number of half-width characters had been printed since the last full width one, to align things.

David J. Littleboy

unread,
Oct 15, 2015, 11:37:39 AM10/15/15
to hon...@googlegroups.com

>From: Richard VanHouten
>
>I'd say "half-width" and "full-width",

So far, agreed. Strongly. 半角 and 全角 should _only_ be translated as
half-width and full-width.

> as indeed en and em are referring to printing width, and have nothing to
> do with the number of bytes used to store the characters.

Ah, but, I think, 半角 and 全角 also "are referring to printing width, and have
nothing to do with the number of bytes". I think it's just a cutesy, inane,
and inanely stupid computer pun that 半角 and 全角 take 1 byte and 2 bytes to
store, respectively. (And, presumably, that's not even true in Unicode.)
There's no a priori logical need that 半角 characters should be limited to 128
characters. Cyrillic and Greek and the like end up being 全角, which is
silly.

Whatever, like others here, I've only heard "en" and "em" as dash lengths.
(Rant: Knuth got sucked down the typesetting rabbit hole and computer
science got set back 30 years.)

--
David J. Littleboy
Tokyo, Japan

Mark Spahn

unread,
Oct 15, 2015, 11:38:04 AM10/15/15
to hon...@googlegroups.com

- - - - - - - -- -

 

Yes, this makes perfect sense.  For hyphen-like characters, I found this helpful:

http://www.chicagomanualofstyle.org/qanda/data/faq/topics/HyphensEnDashesEmDashes/faq0002.html 

 

The en dash is the kind of hyphen you use when one of the

connected components is a multi-word proper noun or is

itself a hyphenated noun.  Example:

 

In the audience<hyphen>participation song contest,

the pro<en dash>Elton<space>John supporters were outvoted by

the pro<en dash>Newton<hyphen>John supporters.

 

Another example:

I love South America’s favorite singer<em dash>Bolivia<space>Newton<hyphen>John.

 

-- Mark Spahn (West Seneca, NY)

 

 

Stephen Suloway

unread,
Oct 15, 2015, 11:42:44 AM10/15/15
to hon...@googlegroups.com
Em and en are printer's measures used mainly during the two centuries or so when byte-type increments corresponded to pieces of wood or metal — as I learned when my college newspaper office included a linotype machine complete with its open well of molten lead (and a bunch of Underwood typewriters). Thanks for reawakening the memories.

I’d say single- and double-byte are conceptually similar to the old words but technically different.

Weblio indeed supplies many suspect usage citations, which may or may not be useful clues, as well as the more reliable definitions from identified sources.

Regards,
Stephen

~ ~ ~ ~ ~ ~ ~ ~ ~
Stephen Suloway


> Warren Smith wrote:
>
> I found the following entry in the Weblio dictionary while looking for something else:
> ----
> そして、半角スペースを印字して、全角の「時」を印字する。
> Then an en space is printed to print out em 'hour'. - 特許庁
> ----
> I am unfamiliar with the words "en" and "em". Are these in common usage for "single byte" and "double byte"?
> …….. do 半角 and 全角 also refer to en and em?

Warren Smith

unread,
Oct 15, 2015, 11:56:41 AM10/15/15
to hon...@googlegroups.com
 Thank you, Dan and Richard.
 
This is very interesting. While for many years, I have translated these words, 半角 and 全角 , as single-byte and double-byte (as they have usually shown up when referring to the data storage structures, not data display structures), would it actually have been better to translate these as "half-width" and "full-width," given that the data storage as "single bytes" has just been incidental to the fact that half-width characters are usually encoded with only a single byte of data?
 
On the other hand, come to think of it, for US readers, would "half-width" even be comprehensible? Because in the US the standard (ASCII) characters are those narrow ones, these would seem to us as "full width," while kanji would impress us as being "double width," while in Japan the standard characters are double-byte kanji and kana, so to them the US characters would be "half width."
 
Given that the world of data processing is still US-centric, I think that going with "single-byte" and "double-byte" (rather than switching to Japan-centric expressions of "half-width" and "full-width") is only prudent to prevent confusion among US readers.
 
What do we think?
 
Warren
 
 

Dan Lucas

unread,
Oct 15, 2015, 12:05:23 PM10/15/15
to hon...@googlegroups.com
Warren - I get your drift, but are "single-byte" and "double-byte" really any more comprehensible to the (admittedly mythical) man in the street than "half-width" and "full-width"? If you could be sure that the readership consists of technically inclined people who understand unicode planes, code points, glyphs and the rest of it, maybe the former two terms make sense. But otherwise...
 
Myself, I haven't actually had to translate these terms very often and I can't remember what I have used in the past. I think in most cases I'd opt for "half-width" and "full-width", simply because there is at least some intuitive correspondence between the meaning of the words and the visual representation of the characters they describe. That's not the case with "single-byte" and "double-byte".
 
Regards
Dan Lucas
 
--
Tel: 44-1239-460-789
Fax: 44-1239-460-840
Terms and conditions for all work as per standard ITI T&C
 
 

Hào Anh Lê 黎英豪

unread,
Oct 15, 2015, 12:09:52 PM10/15/15
to hon...@googlegroups.com
en refers to the letter n and em refers to the letter m; in a non-proportional non-monospaced font (monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with variable-width fonts, where the letters and spacings have different widths.), such as Courier.





Hào Anh Lê

--

Warren Smith

unread,
Oct 15, 2015, 12:16:06 PM10/15/15
to hon...@googlegroups.com

Dan wrote:

 

I get your drift, but are "single-byte" and "double-byte" really any more comprehensible to the (admittedly mythical) man in the street than "half-width" and "full-width"?

 

---

 

Well... fortunately the man in the street woudn't be reading most of the stuff I translate.

 

Here is an interesting quote from a Microsoft web page: A double-byte character set (DBCS), also known as an "expanded 8-bit character set", is an extended single-byte character set (SBCS), implemented as a code page. DBCSs were originally developed to extend the SBCS design to handle languages such as Japanese and Chinese.

 

THe DBCS was obsoluted, by the way, by Unicode.

 

This means that old-timers (like yours truly) would be very comfortable with double-byte referring to the full-width kanji, etc., but perhaps not the younger tech. folks who weren't around for DBCS.

 

That being said, I still tend to use the ASCII set for most of my programming purposes (probably because I am a dinosaur) <grin>.

 

Warren

Warren Smith

unread,
Oct 15, 2015, 12:27:00 PM10/15/15
to hon...@googlegroups.com
David and Richard wrote:
>
>I'd say "half-width" and "full-width",

So far, agreed. Strongly. 半角 and 全角 should _only_ be translated as
half-width and full-width.

-------------

OK. Having just reviewed a few pertinent pages of "Unicode Demystified," I
agree, and will repent my double-byted ways and refer to these as
"half-width" and "full-width" henceforth...

https://books.google.com/books?id=wn5sXG8bEAcC&pg=PA393&lpg=PA393&dq=%22full
-width+characters%22+kanji&source=bl&ots=J1aBt_UiXx&sig=KahnajpVyPbXAa2iJNGw
P5lKmbI&hl=en&sa=X&ved=0CDsQ6AEwBWoVChMI-fvRqfDEyAIVBpmACh1rqw9V#v=onepage&q
=%22full-width%20characters%22%20kanji&f=false

Warren

Herman

unread,
Oct 15, 2015, 2:06:30 PM10/15/15
to hon...@googlegroups.com
On 15/10/15 08:56, Warren Smith wrote:
> Thank you, Dan and Richard.
> This is very interesting. While for many years, I have translated these
> words, 半角 and 全角 , as single-byte and double-byte (as they have
> usually shown up when referring to the data storage structures, not data
> display structures), would it actually have been better to translate
> these as "half-width" and "full-width," given that the data storage as
> "single bytes" has just been incidental to the fact that half-width
> characters are usually encoded with only a single byte of data?

The relationship is not incidental in that in the older systems in use
at the time when this nomenclature arose, which employed fixed width
glyphs, a column or space of a particular width was used to represent
each byte of character data, so single-byte characters would indeed be
half the width of double-byte ones. However, the terms 半角/全角 are in
themselves not specific as to the number of bytes involved, and
depending on the encoding scheme, they may not correspond to a
single/double byte distinction, or, when discussing typographic aspects,
these terms don't have anything to do with bytes at all. Thus, outside
of some limited peculiar contexts, half-width and full-width would be
the more appropriate translation.

Herman Kahn

Jens Wilkinson

unread,
Oct 16, 2015, 10:36:09 AM10/16/15
to hon...@googlegroups.com


> On 2015/10/16, at 1:05, Dan Lucas <dan....@carninglipartners.com> wrote:
>
> Warren - I get your drift, but are "single-byte" and "double-byte" really any more comprehensible to the (admittedly mythical) man in the street than "half-width" and "full-width"? If you could be sure that the readership consists of technically inclined people who understand unicode planes, code points, glyphs and the rest of it, maybe the former two terms make sense. But otherwise...
>
The problem, though, is that for any rational Westerner, the letters they use are not half-width, they are full-width. The Japanese characters might be, in that perspective, double-width.

Jens Wilkinson

Jens Wilkinson

unread,
Oct 16, 2015, 10:42:07 AM10/16/15
to hon...@googlegroups.com

> On 2015/10/16, at 3:06, Herman <sl...@lmi.net> wrote:
>
>> Thus, outside of some limited peculiar contexts, half-width and full-width would be the more appropriate translation.
>
>
Wouldn't it be more reasonable to use normal-width and double-width? Normal English is not written in half-width characters; they are completely normal.

Jens Wilkinson
Reply all
Reply to author
Forward
0 new messages