Yucky logic with redundancy, suggestions for alternatives?

Alf P. Steinbach

unread,

Dec 11, 2017, 11:47:17 AM12/11/17

to

The logic in the code below bothers me, it feels unclean due to the
double testing of whether a character is newline or not.

(Notation: here `ref_<T>` denotes `T&`, `raw_array_of_<n,T>` is `T[n]`,
and the corresponding `array_of_<n,T>` is `std::array<T,n>`. The
definitions are trivial and are available e.g. in my /stdlib/ library,
header `<stdlib/extension/type_builders.hpp>`.)

const raw_array_<wchar_t> flowed_about_text = LR"(
Let 𝘜 be the encoding last saved as default, or, if that encoding isn’t
Unicode,➲
let 𝘜 be UTF-8 with BOM.

When a buffer is activated and has not already been checked:

if the document is empty and its encoding isn’t Unicode, then
its encoding is set to 𝘜.

Ideally the “when a buffer…” should have been “when file a is opened or➲
a new document is created”, but➲
apparently Notepad++ does not inform a plugin of its creation of new➲
documents. Also, ideally the forced encoding should have been the one➲
currently selected as default in Notepad++, but apparently Notepad++
does not➲
make the dynamic configuration info available to a plugin.

Author’s mail address: alf.p.ste...@gmail.com)";

template< class Char, U_size n >
auto unflowed( ref_<raw_array_of_<n, const Char>> flowed_text )
-> array_of_<n, Char>
{
array_of_<n, Char> result;
bool last_was_continuation_char = true; // Skip newline at
start.
ptr_<wchar_t> p_out = &result[0];
for( const wchar_t ch: flowed_text )
{
if( last_was_continuation_char and ch != L'\n' )
{
*p_out++ = L'➲';
}
switch( ch )
{
case L'➲':
{
last_was_continuation_char = true;
continue; // Copy this '➲' next time if
appropriate.
}
case L'\n':
{
if( last_was_continuation_char ) {}
else
{
*p_out++ = L'\n';
}
break;
}
default:
{
*p_out++ = ch;
break;
}
}
last_was_continuation_char = false;
}
*p_out = L'\0';
return result;
}

auto const about_text = unflowed( flowed_about_text );

I know of one alternative, to completely dispense with the logic and
just use Very Long Lines in the string literal. But that also feels wrong.

Cheers!,

- Alf

Richard

unread,

Dec 11, 2017, 12:11:06 PM12/11/17

to

[Please do not mail me a copy of your followup]

I see lots of weird formatting when I view this in my newsreader (trn),
but it's probably because you're using UTF-8 for characters that are
perfectly representable in ASCII and my newsreader is old.

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
<p0mcpt$a5d$1...@dont-email.me> thusly:

>The logic in the code below bothers me, it feels unclean due to the
>double testing of whether a character is newline or not.

Are you referring to the test against the character literal in the if
statement and further down in the switch as the duplicate testing?

> template< class Char, U_size n >
> auto unflowed( ref_<raw_array_of_<n, const Char>> flowed_text )
> -> array_of_<n, Char>
> {
> array_of_<n, Char> result;
> bool last_was_continuation_char = true; // Skip newline at
>start.
> ptr_<wchar_t> p_out = &result[0];
> for( const wchar_t ch: flowed_text )
> {
> if( last_was_continuation_char and ch != L'\n' )
> {

> *p_out++ = L'\xe2\x9e\xb2';
> }

How can you have a character literal with three bytes in it? If you
want a unicode character literal, isn't it more portable to specify
the character code via \u or \U?

...or is this your news software transliterating something?

I have no idea what your source character encoding is that you're
using, so perhaps this is valid code in it's original form.

IMO, it is better to use \u/\U for unicode characters outside of
ASCII, particularly when posting across forums.

> switch( ch )
> {
> case L'➲':
> {
> last_was_continuation_char = true;
> continue; // Copy this '➲' next time if
>appropriate.
> }
> case L'\n':
> {
> if( last_was_continuation_char ) {}
> else
> {
> *p_out++ = L'\n';
> }
> break;
> }
> default:
> {
> *p_out++ = ch;
> break;
> }
> }
> last_was_continuation_char = false;
> }
> *p_out = L'\0';
> return result;
> }
>
> auto const about_text = unflowed( flowed_about_text );

It seems you either have the state as local inline variables as you've
done it here, or you encapsulate the state in some kind of class that
does part of the work. But I don't see how encapsulating it as a class
makes any real difference because this class is already doing
low-level character-by-character work and I don't see any way to
extract out a simpler responsibility.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Alf P. Steinbach

unread,

Dec 11, 2017, 12:30:13 PM12/11/17

to

On 12/11/2017 6:10 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> I see lots of weird formatting when I view this in my newsreader (trn),
> but it's probably because you're using UTF-8 for characters that are
> perfectly representable in ASCII and my newsreader is old.
>
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
> <p0mcpt$a5d$1...@dont-email.me> thusly:
>
>> The logic in the code below bothers me, it feels unclean due to the
>> double testing of whether a character is newline or not.
>
> Are you referring to the test against the character literal in the if
> statement and further down in the switch as the duplicate testing?

Yes.

>
>> template< class Char, U_size n >
>> auto unflowed( ref_<raw_array_of_<n, const Char>> flowed_text )
>> -> array_of_<n, Char>
>> {
>> array_of_<n, Char> result;
>> bool last_was_continuation_char = true; // Skip newline at
>> start.
>> ptr_<wchar_t> p_out = &result[0];
>> for( const wchar_t ch: flowed_text )
>> {
>> if( last_was_continuation_char and ch != L'\n' )
>> {
>> *p_out++ = L'\xe2\x9e\xb2';
>> }
>
> How can you have a character literal with three bytes in it?

What you see is the UTF-8 encoding of Unicode character 'CIRCLED HEAVY
WHITE RIGHTWARDS ARROW' (U+27B2), that I thought would work nicely
visually as a line continuation character (tried various arrows first).

> If you
> want a unicode character literal, isn't it more portable to specify
> the character code via \u or \U?

The source code is UTF-8, so it's no problem just using Unicode
characters directly.

> ...or is this your news software transliterating something?

No, I looked at the posted raw message text. It's OK:

Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

So it's probably trn that's the culprit here.

> I have no idea what your source character encoding is that you're
> using, so perhaps this is valid code in it's original form.

Yes, it's UTF-8 encoded source. I think all source code now should be
UTF-8 encoded :)

> IMO, it is better to use \u/\U for unicode characters outside of
> ASCII, particularly when posting across forums.
>
>> switch( ch )
>> {

>> case L'âž²':
>> {
>> last_was_continuation_char = true;
>> continue; // Copy this 'âž²' next time if

>> appropriate.
>> }
>> case L'\n':
>> {
>> if( last_was_continuation_char ) {}
>> else
>> {
>> *p_out++ = L'\n';
>> }
>> break;
>> }
>> default:
>> {
>> *p_out++ = ch;
>> break;
>> }
>> }
>> last_was_continuation_char = false;
>> }
>> *p_out = L'\0';
>> return result;
>> }
>>
>> auto const about_text = unflowed( flowed_about_text );
>
> It seems you either have the state as local inline variables as you've
> done it here, or you encapsulate the state in some kind of class that
> does part of the work. But I don't see how encapsulating it as a class
> makes any real difference because this class is already doing
> low-level character-by-character work and I don't see any way to
> extract out a simpler responsibility.

Hm, well, it feels redundant, awkward, somehow.

Like there is some really simple elegant way to do it that just refuses
to pop up in my brain.

Cheers!,

- Alf

Öö Tiib

unread,

Dec 11, 2017, 1:28:42 PM12/11/17

to

On Monday, 11 December 2017 18:47:17 UTC+2, Alf P. Steinbach wrote:
> The logic in the code below bothers me, it feels unclean due to the
> double testing of whether a character is newline or not.
>

Ok, I jump to for cycle that as I understood bothers you:

> for( const wchar_t ch: flowed_text )
> {
> if( last_was_continuation_char and ch != L'\n' )
> {
> *p_out++ = L'➲';
> }
> switch( ch )
> {
> case L'➲':
> {
> last_was_continuation_char = true;
> continue; // Copy this '➲' next time if
> appropriate.
> }
> case L'\n':
> {
> if( last_was_continuation_char ) {}
> else
> {
> *p_out++ = L'\n';
> }
> break;
> }
> default:
> {
> *p_out++ = ch;
> break;
> }
> }
> last_was_continuation_char = false;
> }

I would get rid of switch and so then write it like:

for (const wchar_t ch: flowed_text)
{
if (last_was_continuation_char and ch != L'\n')
{
*p_out++ = L'➲';
}
if (ch == L'➲')

{
last_was_continuation_char = true;
continue; // Copy this '➲' next time if appropriate.
}

if (!last_was_continuation_char or ch != L'\n')
{
*p_out++ = ch;
}
last_was_continuation_char = false;
}

It seems like 1/3 shorter and 1/3 cleaner too ... however on such
cases I trust unit tests more than my head. :D May be I misunderstood
your logic and so I am not 100% sure that it actually does same thing. ;)

Richard

unread,

Dec 11, 2017, 1:46:16 PM12/11/17

to

[Please do not mail me a copy of your followup]

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code

<p0mfaj$uh1$1...@dont-email.me> thusly:

>Like there is some really simple elegant way to do it that just refuses
>to pop up in my brain.

Some standard algorithm perhaps?

If you're just stripping your internal "continuation" character, then
isn't it just copy_if?

James Kuyper

unread,

Dec 11, 2017, 6:27:31 PM12/11/17

to

On 12/11/2017 12:10 PM, Richard wrote:
...

> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
> <p0mcpt$a5d$1...@dont-email.me> thusly:

....

>> *p_out++ = L'\xe2\x9e\xb2';
>> }
>
> How can you have a character literal with three bytes in it?

"An ordinary character literal that contains more than one c-char is a
multicharacter literal. A multicharacter literal, or an ordinary
character literal containing a single c-char not representable in the
execution character set, is conditionally-supported, has type int, and
has an implementation-defined value." (2.14.3p1)

> If you
> want a unicode character literal, isn't it more portable to specify
> the character code via \u or \U?

Yes, somewhat. The members of the execution character set are
implementation-defined, and for a given UCN, "if there is no
corresponding member, it is converted to an implementation-defined
member other than the null (wide) character." (2.2p5)

Strictly speaking, it's implementation-defined behavior either way, but
it's completely implementation-defined for multicharacter literals,
UCN's are only implementation-defined if there's no corresponding member
of the execution character set.

Alf P. Steinbach

unread,

Dec 12, 2017, 8:16:38 AM12/12/17

to

On 12/11/2017 7:45 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
> <p0mfaj$uh1$1...@dont-email.me> thusly:
>
>> Like there is some really simple elegant way to do it that just refuses
>> to pop up in my brain.
>
> Some standard algorithm perhaps?

Uhm, something.

In my experience, when I've felt this way, someone else has always been
able to really simplify things.

Which sort of indicates that I too could do that if I just got that
thinking elevated some levels up from my unconscious mind... But I'm
reduced to asking.

> If you're just stripping your internal "continuation" character, then
> isn't it just copy_if?

With a continuation character at the end of a line, it and the newline
are stripped.

Cheers!,

- Alf

Manfred

unread,

Dec 12, 2017, 9:59:01 AM12/12/17

to

On 12/11/2017 5:46 PM, Alf P. Steinbach wrote:
> I know of one alternative, to completely dispense with the logic and
> just use Very Long Lines in the string literal. But that also feels wrong.

I may be misinterpreting what you need, anyway, if it is about reflowing
a string literal, here is an alternative, wherein the only artifacts are
\n and terminal \ continuations (which are at least standard, instead of
➲, which by the way might be wanted in the text)

#include <iostream>

const char flowed_about_text[] =
u8"Let 𝘜 be the encoding last saved as default, or, if that encoding
isn’t Unicode,\
let 𝘜 be UTF-8 with BOM.\n\
\n\
When a buffer is activated and has not already been checked:\n\
\n\
if the document is empty and its encoding isn’t Unicode, then\n\
its encoding is set to 𝘜.\n\
\n\

Ideally the “when a buffer…” should have been “when file a is opened or\
a new document is created”, but\
apparently Notepad++ does not inform a plugin of its creation of new\
documents. Also, ideally the forced encoding should have been the one\
currently selected as default in Notepad++, but apparently Notepad++
does not\

make the dynamic configuration info available to a plugin.\n\
\n\
Author’s mail address: alf.p.ste...@gmail.com";

int main()
{
std::cout << flowed_about_text << std::endl;
}

This works fine with gcc and a utf-8 compliant terminal, I believe you
know how to make it work on Windows..

Jorgen Grahn

unread,

Dec 12, 2017, 10:59:23 AM12/12/17

to

On Mon, 2017-12-11, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> I see lots of weird formatting when I view this in my newsreader (trn),
> but it's probably because you're using UTF-8 for characters that are
> perfectly representable in ASCII and my newsreader is old.

Nitpick: if an UTF-8 character is representable in ASCII, its ASCII
representation is also its UTF-8 representation.

You're talking about using the fancy non-ASCII quote characters and so
on.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Alf P. Steinbach

unread,

Dec 12, 2017, 11:15:56 AM12/12/17

to

On 12/12/2017 3:58 PM, Manfred wrote:
> On 12/11/2017 5:46 PM, Alf P. Steinbach wrote:
>> I know of one alternative, to completely dispense with the logic and
>> just use Very Long Lines in the string literal. But that also feels
>> wrong.
>
> I may be misinterpreting what you need, anyway, if it is about reflowing
> a string literal, here is an alternative, wherein the only artifacts are
> \n and terminal \ continuations (which are at least standard, instead of
> ➲, which by the way might be wanted in the text)

➲ in the text works just fine, and at the end of a line can be
represented by ➲➲ followed by an extra newline, which the simple general
continuation rule then reduces to a single ➲ plus \n.

> #include <iostream>
>
> const char flowed_about_text[] =
> u8"Let 𝘜 be the encoding last saved as default, or, if that encoding
> isn’t Unicode,\
> let 𝘜 be UTF-8 with BOM.\n\
> \n\
> When a buffer is activated and has not already been checked:\n\
> \n\
> if the document is empty and its encoding isn’t Unicode, then\n\
> its encoding is set to 𝘜.\n\
> \n\
> Ideally the “when a buffer…” should have been “when file a is opened or\
> a new document is created”, but\
> apparently Notepad++ does not inform a plugin of its creation of new\
> documents. Also, ideally the forced encoding should have been the one\
> currently selected as default in Notepad++, but apparently Notepad++
> does not\
> make the dynamic configuration info available to a plugin.\n\
> \n\
> Author’s mail address: alf.p.ste...@gmail.com";
>
> int main()
> {
> std::cout << flowed_about_text << std::endl;
> }
>
> This works fine with gcc and a utf-8 compliant terminal, I believe you
> know how to make it work on Windows..

I like this. Explicitly representing the newlines instead of explicitly
representing the absence of newlines. I just knew there had to be
something like this, obvious to all but myself. :-)

Cheers!, and thanks,

- Alf

Richard

unread,

Dec 12, 2017, 5:15:27 PM12/12/17

to

[Please do not mail me a copy of your followup]

Jorgen Grahn <grahn...@snipabacken.se> spake the secret code
<slrnp2vv62.e...@frailea.sa.invalid> thusly:

>On Mon, 2017-12-11, Richard wrote:
>> [Please do not mail me a copy of your followup]
>>
>> I see lots of weird formatting when I view this in my newsreader (trn),
>> but it's probably because you're using UTF-8 for characters that are
>> perfectly representable in ASCII and my newsreader is old.
>
>Nitpick: if an UTF-8 character is representable in ASCII, its ASCII
>representation is also its UTF-8 representation.
>
>You're talking about using the fancy non-ASCII quote characters and so
>on.

Yes, I'm talking about Unicode ' where ASCII ' would have worked just
fine. The same goes for a bunch of other characters. Do we really
need to distinguish between a dash and an emdash on usenet, or email?
No.

Richard

unread,

Dec 12, 2017, 5:16:02 PM12/12/17

to

[Please do not mail me a copy of your followup]

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code

<p0okqv$617$1...@dont-email.me> thusly:

>> If you're just stripping your internal "continuation" character, then
>> isn't it just copy_if?
>
>With a continuation character at the end of a line, it and the newline
>are stripped.

So it's copy_if with a predicate that holds some state about the
previous character.

Öö Tiib

unread,

Dec 12, 2017, 6:54:07 PM12/12/17

to

On Wednesday, 13 December 2017 00:16:02 UTC+2, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code
> <p0okqv$617$1...@dont-email.me> thusly:
>
> >> If you're just stripping your internal "continuation" character, then
> >> isn't it just copy_if?
> >
> >With a continuation character at the end of a line, it and the newline
> >are stripped.
>
> So it's copy_if with a predicate that holds some state about the
> previous character.

Note that there are usually no guarantees when and how often an
standard library algorithm might internally copy the predicate. So
predicates that are implemented as stateful function objects might
have unexpected results depending on implementation. That may make
implied usage non-portable or even to change with next version of
library.

Jorgen Grahn

unread,

Dec 14, 2017, 5:06:54 PM12/14/17

to

On Tue, 2017-12-12, Richard wrote:
...

> Yes, I'm talking about Unicode ' where ASCII ' would have worked just
> fine. The same goes for a bunch of other characters. Do we really
> need to distinguish between a dash and an emdash on usenet, or email?
> No.

For what it's worth, I agree. I'm all for good typography on the web
and on paper, but here it doesn't add any value.

Alf P. Steinbach

unread,

Dec 14, 2017, 10:22:59 PM12/14/17

to

On 12/14/2017 11:06 PM, Jorgen Grahn wrote:
> On Tue, 2017-12-12, Richard wrote:
> ...
>> Yes, I'm talking about Unicode ' where ASCII ' would have worked just
>> fine. The same goes for a bunch of other characters. Do we really
>> need to distinguish between a dash and an emdash on usenet, or email?
>> No.
>
> For what it's worth, I agree. I'm all for good typography on the web
> and on paper, but here it doesn't add any value.

If one refrains from posting source code with non-ASCII characters just
to support some very few people's archaic newsreader software, then IMHO
that's an impractical set of priorities.

And given that newsers therefore should use software able to read UTF-8
encoded text, which now is the default for nearly all other text
exchange on the web, there's no problem with em-dash (as opposed to
certain Chinese glyphs that require two character positions, still a
technical problem for contexts where a monospaced font is desirable).

There's no problem with the ordinary ASCII hyphen either, but to the
degree that em-dash is a problem, for some, there is a problem with
posting modern source code and data, for those people.

Cheers!,

- Alf (opinionated, for the occasion)

David Brown

unread,

Dec 15, 2017, 2:44:39 AM12/15/17

to

On 14/12/17 23:06, Jorgen Grahn wrote:
> On Tue, 2017-12-12, Richard wrote:
> ...
>> Yes, I'm talking about Unicode ' where ASCII ' would have worked just
>> fine. The same goes for a bunch of other characters. Do we really
>> need to distinguish between a dash and an emdash on usenet, or email?
>> No.
>
> For what it's worth, I agree. I'm all for good typography on the web
> and on paper, but here it doesn't add any value.
>

Agreed. When something can be done in simple plain ASCII text, then
that is the format to use. UTF-8 is fine if it adds something useful
(like for Öö's name), or characters that just can't be written sensibly
in ASCII. But "quotation marks", italic _U_, etc., are fine in ASCII.
It makes life easier for people with older software, or if they are
viewing in fonts that don't have such a range of characters. And it is
/much/ easier for most people to type.

I've read "The TeXBook", and like to distinguish between "--" and "---"
in my documentation. But for quick emails or "readme" files, plain text
works best. (I really hate it when people use Word documents for a few
lines of text.)

Richard

unread,

Dec 18, 2017, 6:26:56 PM12/18/17

to

[Please do not mail me a copy of your followup]

"Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret code

<p0vf5t$uki$1...@dont-email.me> thusly:

>On 12/14/2017 11:06 PM, Jorgen Grahn wrote:
>> On Tue, 2017-12-12, Richard wrote:
>> ...
>>> Yes, I'm talking about Unicode ' where ASCII ' would have worked just
>>> fine. The same goes for a bunch of other characters. Do we really
>>> need to distinguish between a dash and an emdash on usenet, or email?
>>> No.
>>
>> For what it's worth, I agree. I'm all for good typography on the web
>> and on paper, but here it doesn't add any value.
>
>If one refrains from posting source code with non-ASCII characters just
>to support some very few people's archaic newsreader software, then IMHO
>that's an impractical set of priorities.

FWIW, I know you like to use UTF-8, which is why I assumed that was
the reason for the weird formatting I saw. I recognize that my
preferred newsreader is ancient and not the best in supporting some
features people like to use. I don't expect anyone to adapt to me,
but I do find some uses of UTF-8 gratuitous with no value added,
simply different.

Just like I find your preference to always use trailing return type :).

Ralf Goertz

unread,

Dec 19, 2017, 3:22:19 AM12/19/17

to

Am Mon, 18 Dec 2017 23:26:48 +0000 (UTC)
schrieb legaliz...@mail.xmission.com (Richard):

> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret
> code <p0vf5t$uki$1...@dont-email.me> thusly:
>

> FWIW, I know you like to use UTF-8, which is why I assumed that was
> the reason for the weird formatting I saw. I recognize that my
> preferred newsreader is ancient and not the best in supporting some
> features people like to use. I don't expect anyone to adapt to me,
> but I do find some uses of UTF-8 gratuitous with no value added,
> simply different.

You say that probably because for your native language there is no
difference between ASCII and UTF-8. For those of us less fortunate (or,
to put it differently, with a richer character set in their language) it
is a huge relief to have an encoding where you don't have to bother
about code pages and stuff like that. And after getting used to it I
found myself playing around with those other characters like „…“
(ellipsis in german style quotation marks) just because I can and it is
so much nicer. Spoiled by using LaTeX, I guess.

> Just like I find your preference to always use trailing return
> type :).

With that I agree ;-)

David Brown

unread,

Dec 19, 2017, 5:10:06 AM12/19/17

to

There is no problem in using UTF-8 where it is /useful/. The issue is
with /gratuitous/ use of extra characters that makes it needlessly
difficult for some people. Not everyone is lucky enough to have Linux
with good fonts, and some people have much more limited systems.

The rule I suggest (and this is merely my opinion) is to use ASCII when
it suffices, and use UTF-8 for other cases. So if your surname is
actually spelt Görtz, then by all means write it that way - it is your
name, and should appear the way you want it to. However, using …
instead of ... or « » instead of " " or << >> adds nothing to the
information in a post (unless the thread is about typography and
characters).

It is a different matter for documents where appearance is more
relevant. You don't want to use simple quotation marks in a LaTeX
document - people might think you wrote the document in Word, and that
would be awful! (That's not sarcasm - I really am that snobby about
typography and document appearance. But everything in its place.)

So yes to /useful/ UTF-8, no to gratuitous non-ASCII.

Ralf Goertz

unread,

Dec 19, 2017, 9:34:24 AM12/19/17

to

Am Tue, 19 Dec 2017 11:09:46 +0100
schrieb David Brown <david...@hesbynett.no>:

> On 19/12/17 09:22, Ralf Goertz wrote:
> >
> > You say that probably because for your native language there is no
> > difference between ASCII and UTF-8. For those of us less fortunate
> > (or, to put it differently, with a richer character set in their
> > language) it is a huge relief to have an encoding where you don't
> > have to bother about code pages and stuff like that. And after
> > getting used to it I found myself playing around with those other
> > characters like „…“ (ellipsis in german style quotation marks) just
> > because I can and it is so much nicer. Spoiled by using LaTeX, I
> > guess.
>
> There is no problem in using UTF-8 where it is /useful/. The issue is
> with /gratuitous/ use of extra characters that makes it needlessly
> difficult for some people. Not everyone is lucky enough to have Linux
> with good fonts, and some people have much more limited systems.

Okay, but are you suggesting that those harmless characters are not
available in other systems than linux? I don't talk about e.g. asian
alphabets (which /I/ can't read anyway) but typographic characters.
Can't we assume that in a computer oriented newsgroup like this people
know how to have UTF-8 displayed correctly after so many years that it
is around?

> The rule I suggest (and this is merely my opinion) is to use ASCII
> when it suffices, and use UTF-8 for other cases. So if your surname
> is actually spelt Görtz, then by all means write it that way - it is
> your name, and should appear the way you want it to.

No, it is actually spelt Goertz which itself is a consequence of not
having around a "UTF-8 capable" typewriter when my father was born in
the 1930s outside of Germany. He regretted that very much all his life
whereas I was happy with it because it is easier when communicating with
non-German people. But the ö is a good example. You need UTF-8 for this
anyway. Okay there is iso*, but i you use non-ASCII characters anyway
one should use UTF-8. Seeing how those characters get messed up when
they are quoted by people not capable of handling them correctly makes
me said. How must Öö feel.

> However, using … instead of ... or « » instead of " " or << >> adds
> nothing to the information in a post (unless the thread is about
> typography and characters).

To my mind it is useful to have distinct characters to open and close a
quote. It helps clarity. It's not needed but it helps. Much like your
usage of two spaces after a sentence. This has always looked wrong for
me. Try using \frenchspacing. ;-) By the way using << and >> instead of
the (french?) quotation marks feels even more wrong (is that the right
comparative?). I associate those with much smaller/bigger.

> It is a different matter for documents where appearance is more
> relevant. You don't want to use simple quotation marks in a LaTeX
> document - people might think you wrote the document in Word, and that
> would be awful! (That's not sarcasm - I really am that snobby about
> typography and document appearance. But everything in its place.)

I agree with you. Completely. But I think this /is/ the place.

David Brown

unread,

Dec 19, 2017, 10:40:08 AM12/19/17

to

On 19/12/17 15:34, Ralf Goertz wrote:
> Am Tue, 19 Dec 2017 11:09:46 +0100
> schrieb David Brown <david...@hesbynett.no>:
>
>> On 19/12/17 09:22, Ralf Goertz wrote:
>>>
>>> You say that probably because for your native language there is no
>>> difference between ASCII and UTF-8. For those of us less fortunate
>>> (or, to put it differently, with a richer character set in their
>>> language) it is a huge relief to have an encoding where you don't
>>> have to bother about code pages and stuff like that. And after
>>> getting used to it I found myself playing around with those other
>>> characters like „…“ (ellipsis in german style quotation marks) just
>>> because I can and it is so much nicer. Spoiled by using LaTeX, I
>>> guess.
>>
>> There is no problem in using UTF-8 where it is /useful/. The issue is
>> with /gratuitous/ use of extra characters that makes it needlessly
>> difficult for some people. Not everyone is lucky enough to have Linux
>> with good fonts, and some people have much more limited systems.
>
> Okay, but are you suggesting that those harmless characters are not
> available in other systems than linux? I don't talk about e.g. asian
> alphabets (which /I/ can't read anyway) but typographic characters.
> Can't we assume that in a computer oriented newsgroup like this people
> know how to have UTF-8 displayed correctly after so many years that it
> is around?

I have no idea what sort of proportions of users have trouble displaying
such typographic UTF-8 characters. But I do know that older systems can
have challenges. A lot of fonts have common Western European characters
(like Latin-9 characters), such as accented letters - but relatively few
have extra typographical characters. Modern gui software can do a fair
job of font substitution so that even if you are normally using a font
without these characters, they can show them from other fonts.

However, there are people out there with older systems, or older
newsreaders, or with particular font settings that might cause
limitations. Linux has supported UTF-8 and associated fonts for ages -
in the Windows world, it is more recent (Windows has supported a variant
of UTF-16 for nearly two decades, but good UTF-8 support and good fonts
are not as old. Remember, Windows XP is the third most popular OS in
the world!).

Is it going to bother anyone if you use UTF-8 typographical symbols? I
don't know - probably not, or at least not many people. Maybe I'm just
old-fashioned. Maybe the percentage of people that can't see these
UTF-8 symbols is so small by now that it is irrelevant.

>
>> The rule I suggest (and this is merely my opinion) is to use ASCII
>> when it suffices, and use UTF-8 for other cases. So if your surname
>> is actually spelt Görtz, then by all means write it that way - it is
>> your name, and should appear the way you want it to.
>
> No, it is actually spelt Goertz which itself is a consequence of not
> having around a "UTF-8 capable" typewriter when my father was born in
> the 1930s outside of Germany. He regretted that very much all his life
> whereas I was happy with it because it is easier when communicating with
> non-German people. But the ö is a good example. You need UTF-8 for this
> anyway. Okay there is iso*, but i you use non-ASCII characters anyway
> one should use UTF-8. Seeing how those characters get messed up when
> they are quoted by people not capable of handling them correctly makes
> me said. How must Öö feel.

The difference here is that you /don't/ need a large UTF-8 font to show
a ö character. You just need a program that understands UTF-8 encoding,
and you need a font with that has that symbol. And lots of fonts do -
fonts for Western European languages, that have been around since the
days of MS-DOS, have it. So someone who is running XP and likes "MS
sans serif" will have no problem with ö, but will have difficulty with
directional quotation marks.

>
>> However, using … instead of ... or « » instead of " " or << >> adds
>> nothing to the information in a post (unless the thread is about
>> typography and characters).
>
> To my mind it is useful to have distinct characters to open and close a
> quote. It helps clarity. It's not needed but it helps.

If it helps clarity and reading, then fair enough - that is a good
reason to use them. But make sure it really helps. For example,
Stefan's use of » « quotations does not, IMHO, help clarity at all - it
hinders it. Along with his non-standard indentation and quoting, and
stunningly ugly coding style, his ideas of what makes a post clear mean
that I rarely bother reading them.

> Much like your
> usage of two spaces after a sentence. This has always looked wrong for
> me. Try using \frenchspacing. ;-)

I learned to type on a typewriter - a mechanical one! (I am not /that/
old, and had been programming before I decided to teach myself
touch-typing properly.) Double spaces at the end of a sentence look
good to me, even though I know they are a bit old fashioned. (There is
some controversy as to whether "French spacing" refers to single or
double spacing at the end of a sentence. But we know that Knuth is
always right :-) )

> By the way using << and >> instead of
> the (french?) quotation marks feels even more wrong (is that the right
> comparative?). I associate those with much smaller/bigger.

I think you have missed a bit of your sentence here...

>
>> It is a different matter for documents where appearance is more
>> relevant. You don't want to use simple quotation marks in a LaTeX
>> document - people might think you wrote the document in Word, and that
>> would be awful! (That's not sarcasm - I really am that snobby about
>> typography and document appearance. But everything in its place.)
>
> I agree with you. Completely. But I think this /is/ the place.
>

Well, maybe we should end this off-topic sub-thread and start making
more use of UTF-8 characters in C++ posts. If it adds to readability
and people are happy viewing it, then I am quite ready to change my
mind. If people complain about funny symbols or question marks turning
up in the posts, then we will know that the c.l.c++ world is not yet
ready for UTF-8 !

Christian Gollwitzer

unread,

Dec 19, 2017, 2:38:26 PM12/19/17

to

Am 19.12.17 um 16:39 schrieb David Brown:

> However, there are people out there with older systems, or older
> newsreaders, or with particular font settings that might cause
> limitations. Linux has supported UTF-8 and associated fonts for ages -
> in the Windows world, it is more recent (Windows has supported a variant
> of UTF-16 for nearly two decades, but good UTF-8 support and good fonts
> are not as old.

This opinion seems very biased. As you say, Windows uses UTF-16
(formerly UCS-2) *internally* for the API to submit a Unicode string to
the display. That doesn't mean that UTF-8 doesn't work there, UTF-8 is
just a transport encoding for Unicode codepoints, and recent ("10 years
old") software can anytime recode it into UTF-16 to call the Windows
API. There is no problem to run Internet Explorer on XP and display a
HTML page encoded in UTF-8.

> Remember, Windows XP is the third most popular OS in
> the world!).

Very sad, since it is not supported any longer.

> The difference here is that you /don't/ need a large UTF-8 font to show
> a ö character. You just need a program that understands UTF-8 encoding,
> and you need a font with that has that symbol. And lots of fonts do -
> fonts for Western European languages, that have been around since the
> days of MS-DOS, have it. So someone who is running XP and likes "MS
> sans serif" will have no problem with ö, but will have difficulty with
> directional quotation marks.

Quote from Wikipedia: https://en.wikipedia.org/wiki/Microsoft_Sans_Serif
"Version 1.41 (supplied with Windows XP SP2) includes 2257 glyphs (2301
characters, 28 blocks), which extended Unicode ranges to include
Combining Diacritical Marks, Currency Symbols, Cyrillic Supplement,
Geometric Shapes, Greek Extended, IPA Extensions, Number Forms, Spacing
Modifier Letters. New OpenType scripts include Arabic MAR script.
Additional OpenType features includes rlig for Arabic scripts."

So, MS Sans Serif actually seems to be a decent Unicode font. Not to
mention "Arial Unicode" which shipped in 2000 with almost everything in
the BMP. I'm inclined to say that the Unicode support used to be much
better in Windows actually than in Linux. I remember a shell tool
"fetchmsttfonts" which downloaded the fonts from Microsoft's servers to
improve the Unicode coverage in Linux.

Christian

Richard

unread,

Dec 19, 2017, 3:02:28 PM12/19/17

to

[Please do not mail me a copy of your followup]

Ralf Goertz <m...@myprovider.invalid> spake the secret code
<20171219092...@delli.fritz.box> thusly:

>Am Mon, 18 Dec 2017 23:26:48 +0000 (UTC)
>schrieb legaliz...@mail.xmission.com (Richard):
>
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret
>> code <p0vf5t$uki$1...@dont-email.me> thusly:
>>
>> FWIW, I know you like to use UTF-8, which is why I assumed that was
>> the reason for the weird formatting I saw. I recognize that my
>> preferred newsreader is ancient and not the best in supporting some
>> features people like to use. I don't expect anyone to adapt to me,
>> but I do find some uses of UTF-8 gratuitous with no value added,
>> simply different.
>
>You say that probably because for your native language there is no
>difference between ASCII and UTF-8.

Nope. I say it because most of the time when I encounter UTF-8 laden
email the uses are gratuitous and often the author isn't even aware
that such gratuitous uses of UTF-8 were done "on their behalf." (Yes,
Microsoft, I'm glaring at you.)

I am conversationally fluent in French and took several years of
immersion Chinese (mainland, not PRC/traditional; I don't consider
myself fluent by any means, but with a dictionary and enough time I
can translate written text reasonably well). I can certainly appreciate
UTF-8 in contexts where it provides value and I recognize that those
contexts exist aplenty.

I certainly would not advocate going back to national character set
extensions of 8-bit ASCII!

>For those of us less fortunate (or,
>to put it differently, with a richer character set in their language) it
>is a huge relief to have an encoding where you don't have to bother
>about code pages and stuff like that.

Agreed. One of my first jobs out of college was working at a company
outside Paris and it was an interesting experience reading source code
where all the comments and variable names were in French :).

>And after getting used to it I
>found myself playing around with those other characters like „…“
>(ellipsis in german style quotation marks) just because I can and it is
>so much nicer. Spoiled by using LaTeX, I guess.

Yeah, well to me it looks like a bunch of hex binary characters, so it
just comes out like useless gibberish. And seriously, do we really
need to use a fancy Unicode character when we could just write ... and
be universally clear? I know this is not going to sway you, I'm
simply making the point that while there are Unicode characters for
which there is no adequate representation in ASCII, there are also a
bunch of Unicode characters for which the ASCII representation is just
fine. We're not trying to typeset books on usenet or email, we're
trying to communicate effectively. IMO, most of these kinds of uses
of Unicode are impeding communication rather than enhancing it.

But hey, we're all luddites here in the sense that we are using usenet
and not the "forum" on cplusplus.com.

>> Just like I find your preference to always use trailing return
>> type :).
>
>With that I agree ;-)

My statement was awkwardly worded, but you know I was saying your use
of trailing return type was gratuitous, right? :)

Richard

unread,

Dec 19, 2017, 3:06:28 PM12/19/17

to

[Please do not mail me a copy of your followup]

Ralf Goertz <m...@myprovider.invalid> spake the secret code

<20171219153...@delli.fritz.box> thusly:

>Can't we assume that in a computer oriented newsgroup like this people
>know how to have UTF-8 displayed correctly after so many years that it
>is around?

You can assume it, but it isn't universally true.

Just to be clear, I'm not asking for any accommodation of my old tools
:), but UTF-8 still isn't universal. I think this is mostly because
some tools that people like (e.g. me) aren't being actively
maintained.

I've investigated the source code for my favorite news reader to see
about updating it, but unfortunately the thing was already a big mess
when the last maintainer stopped updating it themselves. A perfect
example of how open source software gets abandoned because people
allowed the messes to accumulate over time.

Kind of like when you watch an episode of "hoarders" and those people
wallowing in all that mess insist they "need" everything and how they
know where everything is. Of course, to any casual outside observer
the thing looks like the contents of a trash dumpster.

David Brown

unread,

Dec 19, 2017, 3:55:06 PM12/19/17

to

On 19/12/17 20:38, Christian Gollwitzer wrote:
> Am 19.12.17 um 16:39 schrieb David Brown:
>> However, there are people out there with older systems, or older
>> newsreaders, or with particular font settings that might cause
>> limitations. Linux has supported UTF-8 and associated fonts for ages -
>> in the Windows world, it is more recent (Windows has supported a variant
>> of UTF-16 for nearly two decades, but good UTF-8 support and good fonts
>> are not as old.
>
> This opinion seems very biased. As you say, Windows uses UTF-16
> (formerly UCS-2) *internally* for the API to submit a Unicode string to
> the display. That doesn't mean that UTF-8 doesn't work there, UTF-8 is
> just a transport encoding for Unicode codepoints, and recent ("10 years
> old") software can anytime recode it into UTF-16 to call the Windows
> API. There is no problem to run Internet Explorer on XP and display a
> HTML page encoded in UTF-8.
>

Biased, perhaps, but based on experience. Even though NT had UCS-2
(mostly like UTF-16, but without the support for multi-code encodings)
from the beginning, that was in the kernel and the NTFS filesystem.
Lots of software did not support Unicode of any sort, and the few fonts
with Unicode encoding were very sparse. It got better, of course, with
far better fonts and support by the time of XP, and certainly with Win7
(I skipped Vista entirely) there is no problem with UTF-8 in most
applications. Blame it on slow or lazy app developers, rather than
Windows, if you prefer.

>> Remember, Windows XP is the third most popular OS in
>> the world!).
>
> Very sad, since it is not supported any longer.

I have never found support or lack thereof for Windows to be of any
significant benefit. But if you mean that the latest versions of some
applications no longer run on XP, then I agree. Generally, however, I
find I can run most programs fine in XP. (I use Win7 for my Windows
desktop, but XP or even W2K when I need a Windows virtual machine. Such
machines are for running particular programs - not for using the latest
and greatest web browsers.)

>
>
>> The difference here is that you /don't/ need a large UTF-8 font to show
>> a ö character. You just need a program that understands UTF-8 encoding,
>> and you need a font with that has that symbol. And lots of fonts do -
>> fonts for Western European languages, that have been around since the
>> days of MS-DOS, have it. So someone who is running XP and likes "MS
>> sans serif" will have no problem with ö, but will have difficulty with
>> directional quotation marks.
>
> Quote from Wikipedia: https://en.wikipedia.org/wiki/Microsoft_Sans_Serif
> "Version 1.41 (supplied with Windows XP SP2) includes 2257 glyphs (2301
> characters, 28 blocks), which extended Unicode ranges to include
> Combining Diacritical Marks, Currency Symbols, Cyrillic Supplement,
> Geometric Shapes, Greek Extended, IPA Extensions, Number Forms, Spacing
> Modifier Letters. New OpenType scripts include Arabic MAR script.
> Additional OpenType features includes rlig for Arabic scripts."
>
> So, MS Sans Serif actually seems to be a decent Unicode font. Not to
> mention "Arial Unicode" which shipped in 2000 with almost everything in
> the BMP. I'm inclined to say that the Unicode support used to be much
> better in Windows actually than in Linux. I remember a shell tool
> "fetchmsttfonts" which downloaded the fonts from Microsoft's servers to
> improve the Unicode coverage in Linux.
>

Well, experiences vary, I suppose. Perhaps I was unlucky in the fonts I
used or the software I used.

"fetchmsttfonts" is (was) mainly to get fonts that are compatible with
common ones from the Windows world - so that your OpenOffice/LibreOffice
documents match up in font metrics, and so that your Windows programs
under Wine work.

There was a time when Unicode in Linux was bad too, and fonts were
limited. UTF-8 always worked underneath, because most of Linux and the
filesystems simply ignore the details - UTF-8 strings work the same way
as ASCII strings.

Christian Gollwitzer

unread,

Dec 19, 2017, 4:04:48 PM12/19/17

to

Am 19.12.17 um 21:54 schrieb David Brown:

> On 19/12/17 20:38, Christian Gollwitzer wrote:
>> Am 19.12.17 um 16:39 schrieb David Brown:

>>> Remember, Windows XP is the third most popular OS in
>>> the world!).
>>
>> Very sad, since it is not supported any longer.
>
> I have never found support or lack thereof for Windows to be of any
> significant benefit. But if you mean that the latest versions of some
> applications no longer run on XP, then I agree.

I meant that Microsoft stopped supporting XP in 2014, i.e. there will be
known unfixable security bugs in XP. For that reason Windows developers
also stop testing their software on XP. There's anyway no good to reason
to stay with XP - unless you have some specialized hardware like, e.g. a
CT scanner or similar stuff.

Christian

Scott Lurndal

unread,

Dec 19, 2017, 4:25:30 PM12/19/17

to

legaliz...@mail.xmission.com (Richard) writes:
>[Please do not mail me a copy of your followup]
>
>Ralf Goertz <m...@myprovider.invalid> spake the secret code
><20171219153...@delli.fritz.box> thusly:
>
>>Can't we assume that in a computer oriented newsgroup like this people
>>know how to have UTF-8 displayed correctly after so many years that it
>>is around?
>
>You can assume it, but it isn't universally true.
>
>Just to be clear, I'm not asking for any accommodation of my old tools

I am :-). I've no desire to try to use UTF-8 with the athena widgets
that drive my fast, efficient and lightweight newsreader (xrn),
certainly not to display smart quotation marks.

>:), but UTF-8 still isn't universal. I think this is mostly because
>some tools that people like (e.g. me) aren't being actively
>maintained.

Hear, Hear!

David Brown

unread,

Dec 19, 2017, 4:29:39 PM12/19/17

to

When you use Windows, you know it has security bugs. (The same applies
to all systems, of course - but Windows' reputation is, let us say, not
stellar.) When a new version of Windows comes out, there is a flurry of
fixes of the big stuff - gaping holes, or issues that are likely to be
met by many people in practice. But by the time you get past a service
pack or two, these are all done - it is rare that there are security
threats that have a significant risk to normal users who take reasonable
precautions. When you look at the list of security fixes after this
initial period, they are all about local privilege escalations, or holes
in obscure features and services that few people use, or things that are
easily blocked by having a real firewall between your PC and the
internet. Your significant security risks are then from applications,
perhaps especially MS's browsers and other MS software (at least from
the XP era), from user errors (like clicking on random links, opening "I
Love U" emails, and downloading random software), and from broken
"security" software.

I simply do not find XP - at least after SP2 when it stabilized - to be
a significant security risk compared to Win7 or later Windows.

The reason Windows developers stop supporting XP and testing their code
for XP is that they can use newer features on newer OS's, and testing
takes time and costs money - people with XP are unlikely to pay for that
cost. No Windows application developer thinks "I /could/ support XP and
get a wider customer base. But I want to encourage people to have
better security, so I will disable XP support in my code to force people
to update their OS for their own good".

There are lots of reasons to keep running XP. This is perhaps best seen
by looking at the numbers from usage trackers - there are /lots/ of XP
machines in use. And that does not count them all, since many (such as
the CT scanners) are not used for internet browsing. Of course in many
cases, there may not be a good reason for sticking to XP - but a huge
proportion of users really do not see the point in paying lots of money
to "fix" something that is doing a perfectly good job for them.

Jorgen Grahn

unread,

Dec 20, 2017, 1:19:21 AM12/20/17

to

On Tue, 2017-12-19, Ralf Goertz wrote:
> Am Mon, 18 Dec 2017 23:26:48 +0000 (UTC)
> schrieb legaliz...@mail.xmission.com (Richard):
>
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> spake the secret
>> code <p0vf5t$uki$1...@dont-email.me> thusly:
>>
>> FWIW, I know you like to use UTF-8, which is why I assumed that was
>> the reason for the weird formatting I saw. I recognize that my
>> preferred newsreader is ancient and not the best in supporting some
>> features people like to use. I don't expect anyone to adapt to me,
>> but I do find some uses of UTF-8 gratuitous with no value added,
>> simply different.
>
> You say that probably because for your native language there is no
> difference between ASCII and UTF-8. For those of us less fortunate (or,
> to put it differently, with a richer character set in their language)

For the record, this wasn't the reason I complained upthread[0].

It's more what David Brown described. When you have (typically)
monospaced fonts and multiple authors, it's troublesome to see a lot
of different ways to write the same thing (like quoting a piece of
text) and most of them will look almost -- but not quite -- identical
on the screen.

> it is a huge relief to have an encoding where you don't have to
> bother about code pages and stuff like that.

I think we compared ASCII and UTF-8 above. As I understand it,
iso8859-1 and friends aren't supposed to be used on USENET these days.

(The swedish groups used iso8859-1 as a convention, but they are
largely dead by now.)

> And after getting used to it I found myself playing around with
> those other characters like „…“ (ellipsis in german style quotation
> marks) just because I can and it is so much nicer. Spoiled by using
> LaTeX, I guess.

/Jorgen

[0] My own name needs non-ASCII, and the only reason I don't spell it
that way on USENET is that there would be a single character in a
typical posting of mine which would force it to UTF-9.

Reinhardt Behm

unread,

Dec 20, 2017, 2:58:13 AM12/20/17

to

Is iso8859-1 *that* unhealthy?

>
>> And after getting used to it I found myself playing around with
>> those other characters like „…“ (ellipsis in german style quotation
>> marks) just because I can and it is so much nicer. Spoiled by using
>> LaTeX, I guess.
>
> /Jorgen
>
> [0] My own name needs non-ASCII, and the only reason I don't spell it
> that way on USENET is that there would be a single character in a
> typical posting of mine which would force it to UTF-9.
>
--

Reinhardt

Ralf Goertz

unread,

Dec 20, 2017, 3:40:55 AM12/20/17

to

Am Tue, 19 Dec 2017 20:02:17 +0000 (UTC)

schrieb legaliz...@mail.xmission.com (Richard):

> [Please do not mail me a copy of your followup]
>
> Ralf Goertz <m...@myprovider.invalid> spake the secret code
> <20171219092...@delli.fritz.box> thusly:
>
> > Am Mon, 18 Dec 2017 23:26:48 +0000 (UTC) schrieb
> > legaliz...@mail.xmission.com (Richard):

> >And after getting used to it I found myself playing around with those
> >other characters like „…“ (ellipsis in german style quotation marks)
> >just because I can and it is so much nicer. Spoiled by using LaTeX, I
> >guess.
>
> Yeah, well to me it looks like a bunch of hex binary characters, so it
> just comes out like useless gibberish. And seriously, do we really
> need to use a fancy Unicode character when we could just write ... and
> be universally clear? I know this is not going to sway you, I'm
> simply making the point that while there are Unicode characters for
> which there is no adequate representation in ASCII, there are also a
> bunch of Unicode characters for which the ASCII representation is just
> fine.

Interestingly enough, the quotation marks and the ellipsis survived your
quoting. Probably because trn doesn't declare an encoding in which case
I assume it's UTF-8. Might I ask why you don't use another news reader?
trn is a text application right? What about slrn or tin? I don't know
whether they support UTF-8 but they have relatively recent versions out
so I guess they do.

> We're not trying to typeset books on usenet or email, we're trying to
> communicate effectively. IMO, most of these kinds of uses of Unicode
> are impeding communication rather than enhancing it.

No, we are not typesetting a book. And I agree that the content of the
communication is more important than the form. But having UTF-8 around
is certainly a great advantage, especially in math news groups. Of
course I can write \emptyset but ∅ is so much nicer and easier to grasp.
It therefore helps communication.

> But hey, we're all luddites here in the sense that we are using usenet
> and not the "forum" on cplusplus.com.
>
> >> Just like I find your preference to always use trailing return
> >> type :).
> >
> >With that I agree ;-)
>
> My statement was awkwardly worded, but you know I was saying your use
> of trailing return type was gratuitous, right? :)

You know that I am not Alf but Ralf, right? ;-)

Richard

unread,

Dec 20, 2017, 1:29:05 PM12/20/17

to

[Please do not mail me a copy of your followup]

Ralf Goertz <m...@myprovider.invalid> spake the secret code

<20171220094...@delli.fritz.box> thusly:

>> My statement was awkwardly worded, but you know I was saying your use
>> of trailing return type was gratuitous, right? :)
>
>You know that I am not Alf but Ralf, right? ;-)

It's just that when I reread that as quoted, I realized that Alf might
thing that I liked his gratuitous trailing return style :)

And then I had a brain fart and conflated Alf and Ralf, sorry. :)

Jorgen Grahn

unread,

Jan 1, 2018, 4:25:27 PM1/1/18

to

tOn Tue, 2017-12-19, Stefan Ram wrote:

> Ralf Goertz <m...@myprovider.invalid> writes:
>>found myself playing around with those other characters like „…“
>

> In the TeXbook (chapter 12) Donald E. Knuth writes:
>
> |[I]f you try to specify `...' by typing three periods in a
> |row, you get `...' - the dots are too close together. One way
> |to handle this is to go into mathematics mode, using the
> |\ldots control sequence defined in plain TeX format.
>
> So he recommands a special code because a sequence of three
> normal dots is "too close together".
>
> Ironically, usually, in »…«, the dots are too close together
> (here), while when one just types three dots »...« (in the
> monospaced font often used in Newsreaders) they are apart
> just fine.

I think there's a lot to be said for monospaced fonts. Proportional
fonts need serious typography and the attention of professionals,
while monospaced fonts follow the fruitful tradition of the
typewriter.