Unicode character for an underscore with dot above?

Janis Papanagnou

unread,

Oct 20, 2020, 6:44:35 AM10/20/20

to

Likely off-topic (haven't found an Unicode newsgroup, so please bear with me).
(Any pointers to appropriate newsgroups or other sources appreciated.)

Decades ago we used a character that looks line an "underscore with dot above"
in (non-computer based) communication, but I was unable to find an appropriate
character in a google search or in the Unicode tables (besides the [undesired]
option to "compose" it by two glyph primitives). Has anyone else used that
character, knows its formal name, or knows whether there's a representation
for it existing in Unicode?

Janis

David W. Hodgins

unread,

Oct 20, 2020, 7:55:15 AM10/20/20

to

On Tue, 20 Oct 2020 06:44:30 -0400, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> Decades ago we used a character that looks line an "underscore with dot above"
> in (non-computer based) communication, but I was unable to find an appropriate
> character in a google search or in the Unicode tables (besides the [undesired]
> option to "compose" it by two glyph primitives). Has anyone else used that
> character, knows its formal name, or knows whether there's a representation
> for it existing in Unicode?

∸

Found the above using kcharselect in the Mathematical Symbols, Mathematical Operators.

The info displayed for the character ...
Character: ∸ U+2238
Name: DOT MINUS
Annotations and Cross References
Alias names:
saturating subtraction
Notes:
sometimes claimed as notation for symmetric set difference, but ‎∆ U+2206 INCREMENT is preferred
General Character Properties
Block: Mathematical Operators
Unicode category: Symbol, Maths
Various Useful Representations
UTF-8: 0xE2 0x88 0xB8
UTF-16: 0x2238
C octal escaped UTF-8: \342\210\270
XML decimal entity: ∸

Regards, Dave Hodgins

--
Change dwho...@nomail.afraid.org to davidw...@teksavvy.com for
email replies.

Janis Papanagnou

unread,

Oct 20, 2020, 8:59:27 AM10/20/20

to

On 20.10.2020 13:25, David W. Hodgins wrote:
> On Tue, 20 Oct 2020 06:44:30 -0400, Janis Papanagnou
> <janis_pa...@hotmail.com> wrote:
>> Decades ago we used a character that looks line an "underscore with dot above"
>> in (non-computer based) communication, but I was unable to find an appropriate
>> character in a google search or in the Unicode tables (besides the [undesired]
>> option to "compose" it by two glyph primitives). Has anyone else used that
>> character, knows its formal name, or knows whether there's a representation
>> for it existing in Unicode?
>
> ∸
>
> Found the above using kcharselect in the Mathematical Symbols, Mathematical
> Operators.
>
> The info displayed for the character ...
> Character: ∸ U+2238
> Name: DOT MINUS

> [...]

Thank you for digging that up!

It's quite close but not exactly what I had been looking for, though. From
the glyph composition it fits exactly, only the minus dash is centered mid
character here while I was looking for a graphical placement where the
underscore is, at the bottom of the line.

(Thanks also for the kcharselect pointer.)

Janis

David W. Hodgins

unread,

Oct 20, 2020, 10:34:58 AM10/20/20

to

On Tue, 20 Oct 2020 08:59:21 -0400, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> It's quite close but not exactly what I had been looking for, though. From
> the glyph composition it fits exactly, only the minus dash is centered mid
> character here while I was looking for a graphical placement where the
> underscore is, at the bottom of the line.
> (Thanks also for the kcharselect pointer.)

I'm curious what it will be used for.

Janis Papanagnou

unread,

Oct 20, 2020, 11:17:08 AM10/20/20

to

On 20.10.2020 16:34, David W. Hodgins wrote:
> On Tue, 20 Oct 2020 08:59:21 -0400, Janis Papanagnou
> <janis_pa...@hotmail.com> wrote:

>>>> Decades ago we used a character that looks line an "underscore with
>>>> dot above" in (non-computer based) communication

>> It's quite close but not exactly what I had been looking for, though. From
>> the glyph composition it fits exactly, only the minus dash is centered mid
>> character here while I was looking for a graphical placement where the
>> underscore is, at the bottom of the line.
>> (Thanks also for the kcharselect pointer.)
>
> I'm curious what it will be used for.

In the late 1970's we used it as a visible representation for a blank.
(And I am still using it in hand-written paper based communication.)

An alternative character for that use (but too grave for my taste) was
U+2423: ␣ , or the less grave U+23B5: ⎵ (but sequences of those are not
separated visually, they stick together). Other suggestions like the MS
Word dot-in-the-middle have other issues (for example in some languages
they carry semantics as an interpunctation character, and they are also
too similar to the common interpunctation character 'full stop').

The version we used in the past I consider to have the best properties.

Since it seems to have not found its way into the Unicode standard it's
probably time to change, though. From the typical suggestions I'd like
U+23B5: ⎵ best, if only it would have some visible spacing around to
the adjacent characters.

(But maybe my desired version ist hidden somewhere in the Unicode data.
Hard to believe that a formerly standard representation got forgotten.)

Janis

Lew Pitcher

unread,

Oct 20, 2020, 11:34:42 AM10/20/20

to

On Tue, 20 Oct 2020 17:17:02 +0200, Janis Papanagnou wrote:

> On 20.10.2020 16:34, David W. Hodgins wrote:
>> On Tue, 20 Oct 2020 08:59:21 -0400, Janis Papanagnou
>> <janis_pa...@hotmail.com> wrote:
>>>>> Decades ago we used a character that looks line an "underscore with
>>>>> dot above" in (non-computer based) communication
>>> It's quite close but not exactly what I had been looking for, though.
>>> From the glyph composition it fits exactly, only the minus dash is
>>> centered mid character here while I was looking for a graphical
>>> placement where the underscore is, at the bottom of the line.
>>> (Thanks also for the kcharselect pointer.)
>>
>> I'm curious what it will be used for.
>
> In the late 1970's we used it as a visible representation for a blank.
> (And I am still using it in hand-written paper based communication.)

When I learned programming, in the late '70s, we were taught to use a
"lowercase b with diagonal" to represent a blank. I'm happy to say that
/this/ character has made it into unicode, as U+2422

[snip]

> (But maybe my desired version ist hidden somewhere in the Unicode data.
> Hard to believe that a formerly standard representation got forgotten.)
>
> Janis

--
Lew Pitcher
"In Skills, We Trust"

David W. Hodgins

unread,

Oct 20, 2020, 12:08:49 PM10/20/20

to

On Tue, 20 Oct 2020 11:34:35 -0400, Lew Pitcher <lew.p...@digitalfreehold.ca> wrote:
> When I learned programming, in the late '70s, we were taught to use a
> "lowercase b with diagonal" to represent a blank. I'm happy to say that
> /this/ character has made it into unicode, as U+2422

␢ from kcharselect, Symbols / Control Pictures is described as
Character: ␢ U+2422
Name: BLANK SYMBOL
Annotations and Cross References
Notes:
graphic for space
See also:
‎ƀ U+0180 LATIN SMALL LETTER B WITH STROKE
General Character Properties
Block: Control Pictures
Unicode category: Symbol, Other
Various Useful Representations
UTF-8: 0xE2 0x90 0xA2
UTF-16: 0x2422
C octal escaped UTF-8: \342\220\242
XML decimal entity: ␢

though I remember it as being a full slash over the b instead of a stroke.

Janis Papanagnou

unread,

Oct 20, 2020, 12:54:09 PM10/20/20

to

On 20.10.2020 17:34, Lew Pitcher wrote:
> On Tue, 20 Oct 2020 17:17:02 +0200, Janis Papanagnou wrote:
>> On 20.10.2020 16:34, David W. Hodgins wrote:

>>> On Tue, 20 Oct 2020 08:59:21 -0400, Janis Papanagnou wrote:
>>>> [ underscore with dot above ]

>>> I'm curious what it will be used for.
>>
>> In the late 1970's we used it as a visible representation for a blank.
>> (And I am still using it in hand-written paper based communication.)
>
> When I learned programming, in the late '70s, we were taught to use a
> "lowercase b with diagonal" to represent a blank. I'm happy to say that
> /this/ character has made it into unicode, as U+2422

I think it's fine for specification purposes. To maintain readability it's
far too grave, in my opinion.

>> (But maybe my desired version ist hidden somewhere in the Unicode data.
>> Hard to believe that a formerly standard representation got forgotten.)

At least I can *compose* the desired character in Unicode...

$ printf "Abc\u0332.xyz.\n"
Abc̲.xyz.

Janis

Janis Papanagnou

unread,

Oct 20, 2020, 1:00:49 PM10/20/20

to

Now this is strange! In my previous post not the dot was underlined but I
see the 'c' underlined in my copy/pasted printf output. And in this reply
I see in the quoted text the correct character underlined. - Is that a
display or character compose issue of the newsreader? - But let's see how
that character is visible after sending the post...

Janis

Michael Bäuerle

unread,

Oct 20, 2020, 1:07:08 PM10/20/20

to

Janis Papanagnou wrote:
>
> [ underscore with dot above ]

> At least I can *compose* the desired character in Unicode...
>
> $ printf "Abc\u0332.xyz.\n"

This looks wrong: In Unicode a combining character combines with the
previous one.

> Abc̲.xyz.

The underscore is placed below the 'c' character.
To get it below the '.' character, the order should be reversed:

$ printf "Abc.\u0332xyz.\n"

Janis Papanagnou

unread,

Oct 20, 2020, 1:12:48 PM10/20/20

to

On 20.10.2020 19:06, Michael Bäuerle wrote:
> Janis Papanagnou wrote:
>>
>> [ underscore with dot above ]
>> At least I can *compose* the desired character in Unicode...
>>
>> $ printf "Abc\u0332.xyz.\n"
>
> This looks wrong: In Unicode a combining character combines with the
> previous one.
>
>> Abc̲.xyz.
>
> The underscore is placed below the 'c' character.

Yes, as obviously seen, after posting. The printf showed it as desired,
though, and I just copy/pasted that text output into the post.

> To get it below the '.' character, the order should be reversed:
>
> $ printf "Abc.\u0332xyz.\n"

The problem is that on my system printf underlines the 'x' here.

Janis

Janis Papanagnou

unread,

Oct 20, 2020, 1:37:00 PM10/20/20

to

On 20.10.2020 19:12, Janis Papanagnou wrote:
> On 20.10.2020 19:06, Michael Bäuerle wrote:
>> This looks wrong: In Unicode a combining character combines with the
>> previous one.

Yes, and I read that even sequences of combining characters can be
provided after the main character.

> The problem is that on my system printf underlines the 'x' here.

Okay, it's a terminal issue. Various console windows on my Linux
behave differently. The standard one I use shows a buggy behaviour.
Also the Thunderbird newsreader that I use behaves inconsistent;
the viewer shows it correctly while the composer shows it wrong.
Doh!

So use it in the correct form as Michael showed it, printf ".\u0332",
but don't expect that it will be correctly displayed in applications.

Janis

Chris Elvidge

unread,

Oct 20, 2020, 1:44:22 PM10/20/20

to

Would this character be any better? ␣
printf '\xe2\x90\xa3\n' or printf '\u2423\n'
␣

Unicode character Oct Dec Hex HTML
␣ open box 022043 9251 0x2423 ␣

--

Chris Elvidge, England

Chris Elvidge

unread,

Oct 20, 2020, 1:56:23 PM10/20/20

to

On 20/10/2020 04:17 pm, Janis Papanagnou wrote:

I also found this

printf '\xdf\xb8\n' = ߸ = Nko comma

(according to https://shapecatcher.com)

--

Chris Elvidge, England

Chris Elvidge

unread,

Oct 20, 2020, 1:57:57 PM10/20/20

to

On 20/10/2020 04:17 pm, Janis Papanagnou wrote:

I also found this - "Nko comma" according to https://shapecatcher.com/
printf '\xdf\xb8' = ߸

--

Chris Elvidge, England

Janis Papanagnou

unread,

Oct 20, 2020, 1:58:09 PM10/20/20

to

On 20.10.2020 19:44, Chris Elvidge wrote:
> On 20/10/2020 04:17 pm, Janis Papanagnou wrote:
>> On 20.10.2020 16:34, David W. Hodgins wrote:
>>>
>>> I'm curious what it will be used for.
>>
>> In the late 1970's we used it as a visible representation for a blank.
>> (And I am still using it in hand-written paper based communication.)
>>
>> An alternative character for that use (but too grave for my taste) was
>> U+2423: ␣ , or the less grave U+23B5: ⎵ (but sequences of those are not
>> separated visually, they stick together). Other suggestions like the MS
>> Word dot-in-the-middle have other issues (for example in some languages
>> they carry semantics as an interpunctation character, and they are also
>> too similar to the common interpunctation character 'full stop').
>>
>> The version we used in the past I consider to have the best properties.
>>
>> Since it seems to have not found its way into the Unicode standard it's
>> probably time to change, though. From the typical suggestions I'd like
>> U+23B5: ⎵ best, if only it would have some visible spacing around to
>> the adjacent characters.
>>
>

> Would this character be any better? ␣
> printf '\xe2\x90\xa3\n' or printf '\u2423\n'
> ␣
>
> Unicode character Oct Dec Hex HTML
> ␣ open box 022043 9251 0x2423 ␣

It's already on my list of possible alternatives (see quotes above).

It has a better spacing property WRT adjacent characters if compared
to U+23B5: ⎵ but for my taste it is still a bit grave and "disturbes"
the text around, i.e. its readability - much less, of course, than Lew's
"b with stroke" - which even looks like a letter (and maybe it is one
in some [eastern european?] country) - but still.

Given the recently observed application dependent display issues with
composed Unicode characters I suppose I'll currently stay with the less
conspicuous U+23B5: ⎵

Janis

Kaz Kylheku

unread,

Oct 20, 2020, 2:19:29 PM10/20/20

to

On 2020-10-20, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> On 20.10.2020 16:34, David W. Hodgins wrote:
>> On Tue, 20 Oct 2020 08:59:21 -0400, Janis Papanagnou
>> <janis_pa...@hotmail.com> wrote:
>>>>> Decades ago we used a character that looks line an "underscore with
>>>>> dot above" in (non-computer based) communication
>>> It's quite close but not exactly what I had been looking for, though. From
>>> the glyph composition it fits exactly, only the minus dash is centered mid
>>> character here while I was looking for a graphical placement where the
>>> underscore is, at the bottom of the line.
>>> (Thanks also for the kcharselect pointer.)
>>
>> I'm curious what it will be used for.
>
> In the late 1970's we used it as a visible representation for a blank.
> (And I am still using it in hand-written paper based communication.)

From the Korean Hangul block:

U+BBC0: 므

U+373C: 으

:)

Kaz Kylheku

unread,

Oct 20, 2020, 3:10:32 PM10/20/20

to

On 2020-10-20, Kaz Kylheku <793-84...@kylheku.com> wrote:
> From the Korean Hangul block:
>
> U+BBC0: 므
>
> U+373C: 으
>
>:)

U+2358: ⍘ (APL FUNCTIONAL SYMBOL QUOTE UNDERBAR)
U+235B: ⍛ (APL FUNCTIONAL SYMBOL JOT UNDERBAR)

Janis Papanagnou

unread,

Oct 20, 2020, 5:21:33 PM10/20/20

to

On 20.10.2020 19:57, Chris Elvidge wrote:
>
> I also found this - "Nko comma" according to https://shapecatcher.com/
> printf '\xdf\xb8' = ߸

Thanks!

Janis

Janis Papanagnou

unread,

Oct 20, 2020, 5:22:58 PM10/20/20

to

On 20.10.2020 21:10, Kaz Kylheku wrote:
>
> U+2358: ⍘ (APL FUNCTIONAL SYMBOL QUOTE UNDERBAR)

I like this too.

> U+235B: ⍛ (APL FUNCTIONAL SYMBOL JOT UNDERBAR)

Not bad as well.

Janis

Christian Weisgerber

unread,

Oct 20, 2020, 5:30:09 PM10/20/20

to

On 2020-10-20, Kaz Kylheku <793-84...@kylheku.com> wrote:

>>>>>> Decades ago we used a character that looks line an "underscore with
>>>>>> dot above" in (non-computer based) communication
>

> From the Korean Hangul block:
>
> U+BBC0: 므
>
> U+373C: 으

I immediately thought of the Mayan numeral six:
U+1D2E6 𝋦

https://en.wikipedia.org/wiki/Maya_numerals

... which seems to be sadly absent from the Google Noto fonts I have
installed to cover non-Latin/Greek/Cyrillic scripts.

--
Christian "naddy" Weisgerber na...@mips.inka.de

Janis Papanagnou

unread,

Oct 20, 2020, 5:36:06 PM10/20/20

to

On 20.10.2020 22:37, Christian Weisgerber wrote:
>
> I immediately thought of the Mayan numeral six:
> U+1D2E6 𝋦
>
> https://en.wikipedia.org/wiki/Maya_numerals
>
> ... which seems to be sadly absent from the Google Noto fonts I have
> installed to cover non-Latin/Greek/Cyrillic scripts.

My GUI also does not show it, but I see it on the linked web-page.
It's amazing where candidates with similar shapes can be found.

Janis

Michael Bäuerle

unread,

Oct 21, 2020, 5:46:31 AM10/21/20

to

Janis Papanagnou wrote:
> On 20.10.2020 19:12, Janis Papanagnou wrote:
> > On 20.10.2020 19:06, Michael Bäuerle wrote:
> > >
> > > This looks wrong: In Unicode a combining character combines with the
> > > previous one.
>
> Yes, and I read that even sequences of combining characters can be
> provided after the main character.

Yes. But Unicode defines precomposed forms for many characters too.
Example:

ế U+1EBF LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE

The canonical decomposition is defined as the codepoint sequence:

U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX,
U+0301 COMBINING ACUTE ACCENT

There is a canonical decomposition defined for U+00EA by Unicode too.
Fully decomposed the codepoint sequence should be:

U+0065 LATIN SMALL LETTER E,
U+0302 COMBINING CIRCUMFLEX ACCENT,
U+0301 COMBINING ACUTE ACCENT

All three variants are canonically equivalent in the sense of Unicode:
|
| $ printf "\u1ebf\n"
| $ printf "\u00ea\u0301\n"
| $ printf "e\u0302\u0301\n"

This ambigous encoding system is the reason why Unicode strings must be
normalized before they can be compared (e.g. to search for a word in an
article).

Wayne

unread,

Oct 23, 2020, 1:48:11 PM10/23/20

to

On 10/20/2020 11:17 AM, Janis Papanagnou wrote:
> In the late 1970's we used it as a visible representation for a blank.
> (And I am still using it in hand-written paper based communication.)
>
> An alternative character for that use (but too grave for my taste) was
> U+2423: ␣ , or the less grave U+23B5: ⎵ (but sequences of those are not
> separated visually, they stick together). Other suggestions like the MS
> Word dot-in-the-middle have other issues (for example in some languages
> they carry semantics as an interpunctation character, and they are also
> too similar to the common interpunctation character 'full stop').
>
> The version we used in the past I consider to have the best properties.
>
> Since it seems to have not found its way into the Unicode standard it's
> probably time to change, though. From the typical suggestions I'd like
> U+23B5: ⎵ best, if only it would have some visible spacing around to
> the adjacent characters.

You can follow U+23B5 with a thin space char to separate visually, without
switching to a different font. Use U+2009 (THIN SPACE) or U+202F
(NARROW NO-BREAK SPACE).

--
Wayne