Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why No Supplemental Characters In Character Literals?

13 views
Skip to first unread message

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 12:59:30 AM2/4/11
to
Why was it decreed in the language spec that characters beyond U+FFFF are
not allowed in character literals, when they are allowed everywhere else (in
string literals, in the program text, in character and string values etc)?

Lew

unread,
Feb 4, 2011, 1:34:05 AM2/4/11
to

Because a 'char' type holds only 16 bits.

--
Lew
Ceci n'est pas une fenêtre.
.___________.
|###] | [###|
|##/ | *\##|
|#/ * | \#|
|#----|----#|
|| | * ||
|o * | o|
|_____|_____|
|===========|

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 1:59:26 AM2/4/11
to
In message <iig6j2$dul$2...@news.albasani.net>, Lew wrote:

> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>
>> Why was it decreed in the language spec that characters beyond U+FFFF are
>> not allowed in character literals, when they are allowed everywhere else
>> (in string literals, in the program text, in character and string values
>> etc)?
>
> Because a 'char' type holds only 16 bits.

No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
character and string values. Which you are.

Mike Schilling

unread,
Feb 4, 2011, 3:22:03 AM2/4/11
to

"Lawrence D'Oliveiro" <l...@geek-central.gen.new_zealand> wrote in message
news:iig84e$uqu$1...@lust.ihug.co.nz...

Yes, it does (contain 16 bits.) It was defined to do so before there were
supplemental characters, and there was no way to extend it without breaking
compatibility with some older programs.

You can't put a supplementary character in a char. You can put them in
strings, but only encoded as UTF-16, i.e. into two 16-bit chars.

Lew

unread,
Feb 4, 2011, 7:49:57 AM2/4/11
to
Lawrence D'Oliveiro wrote:
>>>> Why was it decreed in the language spec that characters beyond U+FFFF are
>>>> not allowed in character literals, when they are allowed everywhere else
>>>> (in string literals, in the program text, in character and string values
>>>> etc)?

It takes TWO 'char' values to represent a supplemental character. 'char' !=
"character".

READ the documentation.

Lew wrote:
>>> Because a 'char' type holds only 16 bits.

Lawrence D'Oliveiro wrote:
>> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
>> character and string values. Which you are.

I have an idea for you to try - check the documentation.
<http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1>

and you see in §4.2: "... char, whose values are 16-bit unsigned integers ..."

Mike Schilling wrote:
> Yes, it does (contain 16 bits.) It was defined to do so before there were
> supplemental characters, and there was no way to extend it without breaking
> compatibility with some older programs.
>
> You can't put a supplementary character in a char. You can put them in
> strings, but only encoded as UTF-16, i.e. into two 16-bit chars.

As the tutorials and JLS tell you, should you deign to read the documentation.
(It's not a bad idea to do so.)

Joshua Cranmer

unread,
Feb 4, 2011, 8:04:23 AM2/4/11
to

The JLS clearly states that a char is an unsigned 16-bit value. Non-BMP
Unicode characters cannot fit in a single unsigned 16-bit value. Where
other literals compile down, you can use these non-BMP characters
because, e.g., Strings are not individual 16-bit values but an array of
them, and can thus safely hold a pair of them.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Arne Vajhøj

unread,
Feb 4, 2011, 10:49:47 AM2/4/11
to

It is very clearly specified that a Java char is 16 bit.

You can't have the codepoints above U+FFFF in a char.

You can have them in a string but then they actually takes
two chars in that string.

It is rather messy.

If you look at the Java docs for String class you will see:

charAt & codePointAt
length & codePointCount

which is not a nice API.

But since codepoints above U+FFFF was added after the String
class was defined, then the options on how to handle it were
pretty limited.

Arne

Mike Schilling

unread,
Feb 4, 2011, 12:10:47 PM2/4/11
to

"Arne Vajhøj" <ar...@vajhoej.dk> wrote in message
news:4d4c2019$0$23753$1472...@news.sunsite.dk...

The sticky issue is, I think, that chars were defined as 16-bit. If that
had been left undefined, they could have been extended to 24 bits, which
would make things nice and regular again.

Arne Vajhøj

unread,
Feb 4, 2011, 12:33:00 PM2/4/11
to

Yes.

But having specific bit lengths for all types was huge jump
forward compared to C89 regarding predictability of what code
would do.

Arne

Daniele Futtorovic

unread,
Feb 4, 2011, 12:37:15 PM2/4/11
to
On 04/02/2011 16:49, Arne Vajhøj allegedly wrote:
> It is very clearly specified that a Java char is 16 bit.
>
> You can't have the codepoints above U+FFFF in a char.
>
> You can have them in a string but then they actually takes
> two chars in that string.
>
> It is rather messy.
>
> If you look at the Java docs for String class you will see:
>
> charAt & codePointAt
> length & codePointCount
>
> which is not a nice API.
>
> But since codepoints above U+FFFF was added after the String
> class was defined, then the options on how to handle it were
> pretty limited.

They've added supplementary character support to String, StringBuilder,
StringBuffer.

Pity they haven't touched upon java.lang.CharSequence. Probably out of
concerns about compatibility.

Anyone got an idea how supplementary character support could be
integrated with CharSequence, or more generally, with an interface
describing a sequence of code points? Creating a sub-interface, e.g.
UnicodeSequence with int codePointAt(int), etc. doesn't seem like it'd
do the trick, since a UnicodeSequence /is-not/ a CharSequence (char
charAt(int) doesn't make sense for a UnicodeSequence). Adding a new
interface would mean you don't get the interoperability with all the
parts of the API that uses CharSequences... The only option would seem
to refactor CharSequence and all the classes that use or implement it.
Which means no backwards-compatibility.

Bloody mess this is.

--
DF.

Roedy Green

unread,
Feb 4, 2011, 1:26:51 PM2/4/11
to
On Fri, 04 Feb 2011 18:59:30 +1300, Lawrence D'Oliveiro
<l...@geek-central.gen.new_zealand> wrote, quoted or indirectly quoted
someone who said :

>Why was it decreed in the language spec that characters beyond U+FFFF are
>not allowed in character literals, when they are allowed everywhere else (in
>string literals, in the program text, in character and string values etc)?

because they did not exist at the time Java was invented. extended
literals were tacked on to the 16-bit internal scheme in a somewhat
half-hearted way. to go to full 32-bit internally would gobble RAM
hugely.

Java does not have 32-bit String literals, like C style code points
e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the
SurrogatePair applet (see
http://mindprod.com/applet/surrogatepair.html)
to convert C-style code points to a arcane surrogate pairs to let you
use 32-bit Unicode glyphs in your programs.


Personally, I don�t see the point of any great rush to support 32-bit
Unicode. The new symbols will be rarely used. Consider what�s there.
The only ones I would conceivably use are musical symbols and
Mathematical Alphanumeric symbols (especially the German black letters
so favoured in real analysis). The rest I can�t imagine ever using
unless I took up a career in anthropology, i.e. linear B syllabary (I
have not a clue what it is), linear B ideograms (Looks like symbols
for categorising cave petroglyphs), Aegean Numbers (counting with
stones and sticks), Old Italic (looks like Phoenecian), Gothic
(medieval script), Ugaritic (cuneiform), Deseret (Mormon), Shavian
(George Bernard Shaw�s phonetic script), Osmanya (Somalian), Cypriot
syllabary, Byzantine music symbols (looks like Arabic), Musical
Symbols, Tai Xuan Jing Symbols (truncated I-Ching), CJK
extensions(Chinese Japanese Korean) and tags (letters with blank
�price tags�).


--
Roedy Green Canadian Mind Products
http://mindprod.com
To err is human, but to really foul things up requires a computer.
~ Farmer's Almanac
It is breathtaking how a misplaced comma in a computer program can
shred megabytes of data in seconds.

Roedy Green

unread,
Feb 4, 2011, 1:36:19 PM2/4/11
to
On Fri, 04 Feb 2011 08:04:23 -0500, Joshua Cranmer
<Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
who said :

>The JLS clearly states that a char is an unsigned 16-bit value.

Perhaps char will be redefined as 32 bits, or a new unsigned 32-bit
echar type will be invented.

It is an intractable problem. Consider the logic that uses indexOf
substring with character index arithmetic. Most if it would go insane
if you threw a few 32-bit chars in there. You need something that
simulates an array of 32-bit chars to the programmer.

Joshua Cranmer

unread,
Feb 4, 2011, 1:44:08 PM2/4/11
to
On 02/04/2011 12:10 PM, Mike Schilling wrote:
> "Arne Vajhøj" <ar...@vajhoej.dk> wrote in message
>> But since codepoints above U+FFFF was added after the String
>> class was defined, then the options on how to handle it were
>> pretty limited.
>
> The sticky issue is, I think, that chars were defined as 16-bit. If that
> had been left undefined, they could have been extended to 24 bits, which
> would make things nice and regular again.

Well, the real problem is that Unicode swore that 16 bits were enough
for everybody, so people opted for the UTF-16 encoding in Unicode-aware
platforms (e.g., Windows uses 16-bit char values for wchar_t). When they
backtracked and increased the count to 20 bits, every system that did
UTF-16 was now screwed, because UTF-16 "kind of" becomes a
variable-width format like UTF-8... but not really. Instead you get a
mess with surrogate characters, this distinction between UTF-16 and
UCS-2, and, in short, anything not in the Basic Multilingual Plane is a
recipe for disaster.

Extending to 24 bits is problematic because 24 bits opens you up to
unaligned memory access on most, if not all, platforms, so you'd have to
go fully up to 32 bits (this is what the codePoint methods in String et
al. do). But considering the sheer amount of Strings in memory, going to
32-bit memory storage for Strings now doubles the size of that data...
and can increase memory consumption in some cases by 30-40%.

To make a long story short: Unicode made a very, very big mistake, and
everyone who designed their systems to be particularly i18n-aware before
that is now really smarting as a result.

It actually is possible to change the internal storage of String to a
UTF-8 representation (while keeping UTF-16/UTF-32 API access) and still
get good performance--people largely use direct indexes into strings in
largely consistent access patterns (e.g., str.substring(str.indexOf(":")
+ 1) ), so you can cache index lookup tables for a few values. It's ugly
as hell to code properly, taking into account proper multithreading,
etc., but it is not impossible.

markspace

unread,
Feb 4, 2011, 2:04:43 PM2/4/11
to
On 2/4/2011 10:36 AM, Roedy Green wrote:
> On Fri, 04 Feb 2011 08:04:23 -0500, Joshua Cranmer
> <Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
> who said :
>
>> The JLS clearly states that a char is an unsigned 16-bit value.
>
> Perhaps char will be redefined as 32 bits, or a new unsigned 32-bit
> echar type will be invented.


An int is currently used for this purpose. For example,
Character.codePointAt(CharSequence,int) returns an int.


<http://download.oracle.com/javase/6/docs/api/java/lang/Character.html>


Also, from that same page, this explains the whole story in one go:


"Unicode Character Representations

"The char data type (and therefore the value that a Character object
encapsulates) are based on the original Unicode specification, which
defined characters as fixed-width 16-bit entities. The Unicode standard
has since been changed to allow for characters whose representation
requires more than 16 bits. The range of legal code points is now U+0000
to U+10FFFF, known as Unicode scalar value. (Refer to the definition of
the U+n notation in the Unicode standard.)

"The set of characters from U+0000 to U+FFFF is sometimes referred to as
the Basic Multilingual Plane (BMP). Characters whose code points are
greater than U+FFFF are called supplementary characters. The Java 2
platform uses the UTF-16 representation in char arrays and in the String
and StringBuffer classes. In this representation, supplementary
characters are represented as a pair of char values, the first from the
high-surrogates range, (\uD800-\uDBFF), the second from the
low-surrogates range (\uDC00-\uDFFF).

"A char value, therefore, represents Basic Multilingual Plane (BMP) code
points, including the surrogate code points, or code units of the UTF-16
encoding. An int value represents all Unicode code points, including
supplementary code points. The lower (least significant) 21 bits of int
are used to represent Unicode code points and the upper (most
significant) 11 bits must be zero.


...etc....

markspace

unread,
Feb 4, 2011, 2:27:41 PM2/4/11
to
On 2/4/2011 9:37 AM, Daniele Futtorovic wrote:

> Pity they haven't touched upon java.lang.CharSequence. Probably out of
> concerns about compatibility.


You know that Character has static methods for pulling code points out
of a CharSequence, right?

Lew

unread,
Feb 4, 2011, 3:43:17 PM2/4/11
to
Lawrence D'Oliveiro wrote:
>>> Why was it decreed in the language spec that characters beyond U+FFFF are
>>> not allowed in character literals, when they are allowed everywhere else
>>> (in string literals, in the program text, in character and string values
>>> etc)?
>

Lew wrote:
>> Because a 'char' type holds only 16 bits.
>

Lawrence D'Oliveiro wrote:
> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
> character and string values. Which you are.
>

/* DemoChar */
package eg;
public class DemoChar
{
public static void main( String [] args )
{
System.out.println( "Character.MAX_VALUE + 1 = "
+ (Character.MAX_VALUE + 1) );

char foo1, foo2;
foo1 = (char) (Character.MAX_VALUE - 1);
foo2 = (char) (foo1 / 2);
System.out.println( "foo1 = "+ (int) foo1 +", foo2 = "+ (int)
foo2 );

foo1 = '§';
foo2 = '@';
char sum = (char) (foo1 + foo2);
System.out.println( "foo1 + foo2 = "+ sum );
}
}

--
Lew

Daniele Futtorovic

unread,
Feb 4, 2011, 4:25:10 PM2/4/11
to

Yeah. But that's not quite the same thing, is it? What with OOP and all.

Daniele Futtorovic

unread,
Feb 4, 2011, 4:28:31 PM2/4/11
to

And Klingon!

--
DF.

Tom Anderson

unread,
Feb 4, 2011, 4:30:57 PM2/4/11
to
On Fri, 4 Feb 2011, Joshua Cranmer wrote:

>> "Arne Vajhᅵj" <ar...@vajhoej.dk> wrote in message


>>
>>> But since codepoints above U+FFFF was added after the String class was
>>> defined, then the options on how to handle it were pretty limited.
>

> Extending to 24 bits is problematic because 24 bits opens you up to
> unaligned memory access on most, if not all, platforms, so you'd have to
> go fully up to 32 bits (this is what the codePoint methods in String et
> al. do). But considering the sheer amount of Strings in memory, going to
> 32-bit memory storage for Strings now doubles the size of that data...
> and can increase memory consumption in some cases by 30-40%.

This is something i ponder quite a lot.

It's essential that computers be able to represent characters from any
living human script. The astral planes include some such characters,
notably in the CJK extensions, without which it is impossible to write
some people's names correctly. The necessity of supporting more than 2**16
codepoints is simply beyond question.

The problem is how to do it efficiently.

Going to strings of 24- or 32-bit characters would indeed be prohibitive
in its effect in memory. But isn't 16-bit already an eye-watering waste?
Most characters currently sitting in RAM around the world are, i would
wager, in the ASCII range: the great majority of characters in almost any
text in a latin script will be ASCII, in that they won't have diacritics
[1] (and most text is still in latin script), and almost all characters in
non-natural-language text (HTML and XML markup, configuration files,
filesystem paths) will be ASCII. A sizeable fraction of non-latin text is
still encodable in one byte per character, using a national character set.
Forcing all users of programs written in Java (or any other platform which
uses UCS-2 encoding) to spend two bytes on each of those characters to
ease the lives of the minority of users who store a lot of CJK text seems
wildly regressive.

I am, however, at a loss to suggest a practical alternative!

A question to the house, then: has anyone ever invented a data structure
for strings which allows space-efficient storage for strings in different
scripts, but also allows time-efficient implementation of the common
string operations?

Upthread, Joshua mentions the idea of using UTF-8 strings, and cacheing
codepoint-to-bytepoint mappings. That's certainly an approach that would
work, although i worry about the performance effect of generating so many
writes, the difficulty of making it correct in multithreaded systems, and
the dependency on a good cache hit rate to make it pay off.

Anyone else?

For extra credit, give a representation which also makes it simple and
efficient to do normalisation, reversal, and "find the first occurrence of
this character, ignoring diacritics".

tom

[1] I would be interested to hear of a language (more properly, an
orthography) using latin script in which a majority of characters, or even
an unusually large fraction, do have diacritics. The pinyin romanisation
of Mandarin uses a lot of accents. Hawaiian uses quite a lot. Some ways of
writing ancient Greek use a lot of diacritics, for breathings and accents
and in verse, for long and short syllables.

--
Understand the world we're living in

rossum

unread,
Feb 4, 2011, 4:44:24 PM2/4/11
to
On Fri, 04 Feb 2011 10:26:51 -0800, Roedy Green
<see_w...@mindprod.com.invalid> wrote:

>i.e. linear B syllabary (I have not a clue what it is),
>linear B ideograms (Looks like symbols for categorising cave
>petroglyphs)

Used in Minoan Crete. Google "Michael Ventris".

rossum

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 5:26:53 PM2/4/11
to
In message <iigcva$90q$1...@news.eternal-september.org>, Mike Schilling wrote:

> Yes, it does (contain 16 bits.)

Yeah, I didn’t realize it was spelled out that way in the original language
spec. What a short-sighted decision.

> It was defined to do so before there were supplemental characters ...

Why was there a need to define the size of a character at all? Even in the
early days of the unification of Unicode and ISO-10646, there was already
provision for UCS-4. Did they really think that could safely be ignored?

Joshua Cranmer

unread,
Feb 4, 2011, 5:28:35 PM2/4/11
to
On 02/04/2011 04:30 PM, Tom Anderson wrote:
> A question to the house, then: has anyone ever invented a data structure
> for strings which allows space-efficient storage for strings in
> different scripts, but also allows time-efficient implementation of the
> common string operations?

I think the real answer is that maybe we need to rethink traditional
string APIs. Particularly, we have the issues of diacratics, since "A
[combining diacritic `]" is basically 1 character stored in 3,4, or 8
bytes, depending on storage format.

I would be surprised if there weren't already some studies on the impact
of using UTF-8 based strings in UTF-16/-32-ish contexts.

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 5:43:18 PM2/4/11
to
In message <iihikc$dpj$1...@news.eternal-september.org>, markspace wrote:

> <http://download.oracle.com/javase/6/docs/api/java/lang/Character.html>


>
> "The char data type (and therefore the value that a Character object
> encapsulates) are based on the original Unicode specification, which
> defined characters as fixed-width 16-bit entities.

When did the unification with ISO-10646 happen? That was already talking
about 32-bit characters.

> "A char value, therefore, represents Basic Multilingual Plane (BMP) code
> points, including the surrogate code points, or code units of the UTF-16
> encoding. An int value represents all Unicode code points, including
> supplementary code points.

Why was there even a need to spell out the size of a char? If you wanted
types with explicit sizes, there was already byte, short, int and long.

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 5:45:18 PM2/4/11
to
In message <ejeok6d6v98ju1tpq...@4ax.com>, Roedy Green wrote:

> Personally, I don’t see the point of any great rush to support 32-bit

> Unicode. ... The rest I can’t imagine ever using unless I took up a career
> in anthropology ...

But you, or another programmer, might work for an anthropologist. The
computer is a universal machine, after all. If a programming language can’t
support that universality, what good is it?

Arne Vajhøj

unread,
Feb 4, 2011, 5:54:19 PM2/4/11
to

It provides well defined semantics.

Nobody wanted to repeat C89 undefined/implementation specific
behavior.

Arne

Roedy Green

unread,
Feb 4, 2011, 6:02:37 PM2/4/11
to
On Sat, 05 Feb 2011 11:43:18 +1300, Lawrence D'Oliveiro
<l...@geek-central.gen.new_zealand> wrote, quoted or indirectly quoted
someone who said :

>


>Why was there even a need to spell out the size of a char? If you wanted
>types with explicit sizes, there was already byte, short, int and long.

I think because Java's designers thought on the byte code level.
There, chars are unsigned 16-bit. That they are used for chars was not
really of interest to them. Much of Java is just a thin wrapper
around byte code. It has no high level features of its own.

Roedy Green

unread,
Feb 4, 2011, 6:03:29 PM2/4/11
to
On Sat, 05 Feb 2011 11:26:53 +1300, Lawrence D'Oliveiro

<l...@geek-central.gen.new_zealand> wrote, quoted or indirectly quoted
someone who said :

>Why was there a need to define the size of a character at all?

Because C did worked that way and lead to non-wora code.

Arne Vajhøj

unread,
Feb 4, 2011, 6:04:00 PM2/4/11
to
On 04-02-2011 17:26, Lawrence D'Oliveiro wrote:
> In message<iigcva$90q$1...@news.eternal-september.org>, Mike Schilling wrote:
>> Yes, it does (contain 16 bits.)
>
> Yeah, I didn’t realize it was spelled out that way in the original language
> spec.

It is. And give that you in another thread talk about problems in JLS,
then I think you should have read it.

It should also be in most Java beginners books.

It is also in the Java tutorial:

http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html

> What a short-sighted decision.

Back then Unicode was 16 bit.

The increase in bit was done in 1996 after the release of Java 1.0.

>> It was defined to do so before there were supplemental characters ...
>
> Why was there a need to define the size of a character at all?

Well defines data types is a very good thing.

> Even in the
> early days of the unification of Unicode and ISO-10646, there was already
> provision for UCS-4.

Java decided to do Unicode. And at that time 16 bit was sufficient
for that.

> Did they really think that could safely be ignored?

Apparently yes.

Given that the 16 bit had just replaced 8 bit, then I think it
is understandable.

Arne

Roedy Green

unread,
Feb 4, 2011, 6:08:33 PM2/4/11
to
On Fri, 04 Feb 2011 13:44:08 -0500, Joshua Cranmer
<Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
who said :

>Well, the real problem is that Unicode swore that 16 bits were enough
>for everybody,

be fair. I bought a book showing thousands of Unicode glyphs including
pages and pages of HAN ideographs. There were plenty of holes for
future growth. At the time I thought it was overkill. When I started
my career, character sets had 64 glyphs, including control chars.
Later it was considered "extravagant" to use lower case since it took
so much longer to print. In the very early days, each installation
designed its own local character set. I recall sitting in one such
meeting, and Vern Detwiler (later of MacDonald Detwiler) explaining
the virtues of new code called ASCII.

Arne Vajhøj

unread,
Feb 4, 2011, 6:10:27 PM2/4/11
to

Most western people does never use them.

But that does not mean much as we got our stuff in the low codepoints.

The relevant question is whether Chinese/Japanese/Korean use the
>=64K code points.

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 6:12:38 PM2/4/11
to
On 04-02-2011 18:08, Roedy Green wrote:
> On Fri, 04 Feb 2011 13:44:08 -0500, Joshua Cranmer
> <Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
> who said :
>> Well, the real problem is that Unicode swore that 16 bits were enough
>> for everybody,
>
> be fair. I bought a book showing thousands of Unicode glyphs including
> pages and pages of HAN ideographs. There were plenty of holes for
> future growth. At the time I thought it was overkill. When I started
> my career, character sets had 64 glyphs, including control chars.
> Later it was considered "extravagant" to use lower case since it took
> so much longer to print. In the very early days, each installation
> designed its own local character set. I recall sitting in one such
> meeting, and Vern Detwiler (later of MacDonald Detwiler) explaining
> the virtues of new code called ASCII.

Impressive that he wanted to discuss that with a 12 year old.

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 6:17:11 PM2/4/11
to

The idea that a single programming language needs to support
everything is not a good one.

Maybe Java is just not the right language for anthropology.

If they had new that Unicode would go beyond 64K, then they
probably would have come up with a different solution.

But they did not.

And we live it it.

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 6:19:27 PM2/4/11
to
On 04-02-2011 13:36, Roedy Green wrote:
> On Fri, 04 Feb 2011 08:04:23 -0500, Joshua Cranmer
> <Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
> who said :
>> The JLS clearly states that a char is an unsigned 16-bit value.
>
> Perhaps char will be redefined as 32 bits, or a new unsigned 32-bit
> echar type will be invented.
>
> It is an intractable problem. Consider the logic that uses indexOf
> substring with character index arithmetic. Most if it would go insane
> if you threw a few 32-bit chars in there. You need something that
> simulates an array of 32-bit chars to the programmer.

I don't think they can come up with a solution that both provides
good support for the high code points and will keep old code
running unchanges.

echar and EString would keep old stuff running, but would
completely blow up the entire API.

Arne

Roedy Green

unread,
Feb 4, 2011, 6:22:06 PM2/4/11
to
On Fri, 4 Feb 2011 21:30:57 +0000, Tom Anderson <tw...@urchin.earth.li>

wrote, quoted or indirectly quoted someone who said :

>I am, however, at a loss to suggest a practical alternative!

What might happen is strings are nominally 32-bit.

You could probably come up with a very rapid compression scheme,
similar to UTF-8 but with a bit more compression, that could be
applied to strings at garbage collection time if they have not been
referenced since the last GC sweep.

String are immutable. This admits some other flavours of
"compression".

If the high three bytes of the string are 0, store the string
UNCOMPRESSED, as a string of bytes. All the indexOf indexing
arithmetic works identically. This behaviour is hidden inside the
JVM. The String class knows nothing about it. It is an implementation
detail of 32-bit strings.

If the high two bytes of the string are 0, store the string
uncompressed as a string of unsigned shorts.

if there are any one bits in the high 2 byte, store as a string of
unsigned ints.

Strings are what you gobble up your RAM with. If we start supporting
32 bit chars, we have to do something to compensate for the doubling
of RAM use.

Short lived strings would still be 32-bit. They would only be
converted to the other forms if they have been sitting around for a
while. Interned strings would be immediately converted to canonical
form.

Roedy Green

unread,
Feb 4, 2011, 6:29:49 PM2/4/11
to
On Sat, 05 Feb 2011 11:45:18 +1300, Lawrence D'Oliveiro

<l...@geek-central.gen.new_zealand> wrote, quoted or indirectly quoted
someone who said :

>But you, or another programmer, might work for an anthropologist. The

>computer is a universal machine, after all. If a programming language can’t
>support that universality, what good is it?

Of course, It is just PERSONALLY I am not likely to use many of these
character sets. They are not important for business, so I doubt Oracle
puts support for them a high priority.

I get a strange pleasure out of poking around the odd corners of the
Unicode glyphs, just admiring the art of various cultures in designing
their alphabets, often baffled which they would design so many letters
almost identical. I'd love to have an excuse to paint with these
glyphs. The glyphs that fascinate me most are Arabic which look to
have rules of typography that boggle the western mind.

Arne Vajhøj

unread,
Feb 4, 2011, 6:40:23 PM2/4/11
to
On 04-02-2011 18:29, Roedy Green wrote:
> On Sat, 05 Feb 2011 11:45:18 +1300, Lawrence D'Oliveiro
> <l...@geek-central.gen.new_zealand> wrote, quoted or indirectly quoted
> someone who said :
>> But you, or another programmer, might work for an anthropologist. The
>> computer is a universal machine, after all. If a programming language can’t

>> support that universality, what good is it?
>
> Of course, It is just PERSONALLY I am not likely to use many of these
> character sets. They are not important for business, so I doubt Oracle
> puts support for them a high priority.

Sure about that?

Some claim that the economies of China, Japan and South Korea
are pretty important for business.

The question is whether the high code points are important
for those. I don't know.

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 6:41:30 PM2/4/11
to

indexOf works fine with compression, but substring and charAt becomes
rather expensive.

Arne

Lawrence D'Oliveiro

unread,
Feb 4, 2011, 6:54:06 PM2/4/11
to
In message <tk2pk6d1m27fop2cr...@4ax.com>, Roedy Green wrote:

> I get a strange pleasure out of poking around the odd corners of the

> Unicode glyphs ...

You’re not the only one. :)

> ... just admiring the art of various cultures in designing their


> alphabets, often baffled which they would design so many letters
> almost identical.

There seem to be an awful lot of cases of adapting a letter from one
alphabet for a completely different purpose in another. Look at the
correspondences between Cyrillic and Roman, just for example: V → B, S → C,
that kind of thing.

> The glyphs that fascinate me most are Arabic which look to have rules of
> typography that boggle the western mind.

And Arabic script was adopted by a whole lot of different languages which
had sounds that Arabic did not. So they had to make up their own letters,
most commonly by adding different numbers of dots to the existing shapes.

Joshua Cranmer

unread,
Feb 4, 2011, 7:05:54 PM2/4/11
to
On 02/04/2011 05:26 PM, Lawrence D'Oliveiro wrote:
> In message<iigcva$90q$1...@news.eternal-september.org>, Mike Schilling wrote:
>
>> Yes, it does (contain 16 bits.)
>
> Yeah, I didn’t realize it was spelled out that way in the original language
> spec. What a short-sighted decision.

It would have been stupider to have not specified a guaranteed size for
char. Take C (+ POSIX), where the definitions of sizes are very loosely
defined, and you very quickly get non-portable code. Yes, you can in
theory change the size of, say, time_t independently of other types, but
it doesn't do you much good if half the C code assumes sizeof(time_t) ==
sizeof(int). Pinning down the sizes of the types was a _very good_ move
on Java's part.

> Why was there a need to define the size of a character at all? Even in the
> early days of the unification of Unicode and ISO-10646, there was already
> provision for UCS-4. Did they really think that could safely be ignored?

Knowing the results of other properly Unicode-aware code in the first
days of Unicode, I believe that Unicode quite heavily gave an impression
of "Unicode == 16 bit". Java is not the only major platform to be bitten
by now-Unicode-is-32-bits... the Windows platform has 16-bit characters
embedded into it.

Joshua Cranmer

unread,
Feb 4, 2011, 7:13:26 PM2/4/11
to
On 02/04/2011 06:41 PM, Arne Vajhøj wrote:
> indexOf works fine with compression, but substring and charAt becomes
> rather expensive.

I have seen it argued that random-access-ish stuff like substring and
charAt aren't really all that random access, in that they tend to be
"small" constants away from the beginning, end, or last indexOf computation.

See
<http://weblogs.mozillazine.org/roc/archives/2008/01/string_theory.html>.

markspace

unread,
Feb 4, 2011, 7:37:21 PM2/4/11
to
On 2/4/2011 1:25 PM, Daniele Futtorovic wrote:

>
> Yeah. But that's not quite the same thing, is it? What with OOP and all.
>


Fair enough.

Since it's not possible to add new methods to an interface without
breaking all existing subclasses, I have to assume that is why
CharSequence was never modified.

The Lambda project for Java has been working on closures. They've also
proposed extension methods/defender methods to allow Java interfaces to
be modified. I think the best chance of getting CharSequence modified
would be through that mechanism when it becomes available.

I'm not sure off hand who is working on the extension methods. It might
be a good idea to contact them about getting CharSequence modified along
with whatever else they'll be doing.

Mike Schilling

unread,
Feb 4, 2011, 7:37:53 PM2/4/11
to

"Joshua Cranmer" <Pidg...@verizon.invalid> wrote in message
news:iii493$nn8$1...@news.eternal-september.org...


.NET, which in several cases took advantage of following Java to correct
some of its mistakes (e.g. signed bytes), didn’t fix this one.

Arne Vajhøj

unread,
Feb 4, 2011, 7:56:57 PM2/4/11
to

Which is a bit surprising since high code points were introduced
when .NET came around.

But they probably had a compatibility issue with p/Invoke and
Win32 API, COM interop, C++ mixed mode etc. that all had
to work with existing Win32 model of 16 bit wchars.

Arne


Mike Schilling

unread,
Feb 4, 2011, 8:02:37 PM2/4/11
to

"Arne Vajhøj" <ar...@vajhoej.dk> wrote in message
news:4d4ca055$0$23765$1472...@news.sunsite.dk...

Or, relentless micro-optimizers that they are, Microsoft wasn't willing to
bite off the size/performance issues.

Arne Vajhøj

unread,
Feb 4, 2011, 8:06:37 PM2/4/11
to
On 04-02-2011 19:37, markspace wrote:
> On 2/4/2011 1:25 PM, Daniele Futtorovic wrote:
>> Yeah. But that's not quite the same thing, is it? What with OOP and all.
>
> Fair enough.
>
> Since it's not possible to add new methods to an interface without
> breaking all existing subclasses, I have to assume that is why
> CharSequence was never modified.

Do you think Lew will make a little note about certain JDBC
interfaces?

:-)

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 8:08:48 PM2/4/11
to
On 04-02-2011 19:13, Joshua Cranmer wrote:

> On 02/04/2011 06:41 PM, Arne Vajh�j wrote:
>> indexOf works fine with compression, but substring and charAt becomes
>> rather expensive.
>
> I have seen it argued that random-access-ish stuff like substring and
> charAt aren't really all that random access, in that they tend to be
> "small" constants away from the beginning, end, or last indexOf
> computation.

Could be.

But would it be practical to use it?

Arne

Ken Wesson

unread,
Feb 4, 2011, 10:21:55 PM2/4/11
to

Relentless micro-optimizers of what, their cashflow?

Ken Wesson

unread,
Feb 4, 2011, 10:25:47 PM2/4/11
to
On Fri, 04 Feb 2011 21:30:57 +0000, Tom Anderson wrote:

> On Fri, 4 Feb 2011, Joshua Cranmer wrote:
>
>>> "Arne Vajhøj" <ar...@vajhoej.dk> wrote in message
>>>

>>>> But since codepoints above U+FFFF was added after the String class
>>>> was defined, then the options on how to handle it were pretty
>>>> limited.
>>
>> Extending to 24 bits is problematic because 24 bits opens you up to
>> unaligned memory access on most, if not all, platforms, so you'd have
>> to go fully up to 32 bits (this is what the codePoint methods in String
>> et al. do). But considering the sheer amount of Strings in memory,
>> going to 32-bit memory storage for Strings now doubles the size of that
>> data... and can increase memory consumption in some cases by 30-40%.
>
> This is something i ponder quite a lot.
>
> It's essential that computers be able to represent characters from any
> living human script. The astral planes include some such characters,
> notably in the CJK extensions, without which it is impossible to write
> some people's names correctly. The necessity of supporting more than
> 2**16 codepoints is simply beyond question.
>
> The problem is how to do it efficiently.
>
> Going to strings of 24- or 32-bit characters would indeed be prohibitive
> in its effect in memory. But isn't 16-bit already an eye-watering waste?
> Most characters currently sitting in RAM around the world are, i would
> wager, in the ASCII range: the great majority of characters in almost
> any text in a latin script will be ASCII, in that they won't have
> diacritics [1] (and most text is still in latin script), and almost all
> characters in non-natural-language text (HTML and XML markup,
> configuration files, filesystem paths) will be ASCII. A sizeable
> fraction of non-latin text is still encodable in one byte per character,
> using a national character set. Forcing all users of programs written in
> Java (or any other platform which uses UCS-2 encoding) to spend two
> bytes on each of those characters to ease the lives of the minority of
> users who store a lot of CJK text seems wildly regressive.


>
> I am, however, at a loss to suggest a practical alternative!
>

> A question to the house, then: has anyone ever invented a data structure
> for strings which allows space-efficient storage for strings in
> different scripts, but also allows time-efficient implementation of the
> common string operations?
>
> Upthread, Joshua mentions the idea of using UTF-8 strings, and cacheing
> codepoint-to-bytepoint mappings. That's certainly an approach that would
> work, although i worry about the performance effect of generating so
> many writes, the difficulty of making it correct in multithreaded
> systems, and the dependency on a good cache hit rate to make it pay off.
>
> Anyone else?

I vote a hybrid-RLE approach: runs with the same high three bytes have a
length, the high three bytes, and then all the low bytes of the run. For
plain ASCII text that will mean <length of string> 0 0 0 <ASCII string>.
A lot of other language texts will have long runs with a fixed pattern of
high bytes, or long runs of 0 0 0 and the odd accented character. Limit
run length to 255 so the length is always one byte. So every run gets
four bytes added, instead of every *character* getting three.

Ken Wesson

unread,
Feb 4, 2011, 10:26:49 PM2/4/11
to
On Fri, 04 Feb 2011 16:37:21 -0800, markspace wrote:

> On 2/4/2011 1:25 PM, Daniele Futtorovic wrote:
>
>> Yeah. But that's not quite the same thing, is it? What with OOP and
>> all.
>
> Fair enough.
>
> Since it's not possible to add new methods to an interface without
> breaking all existing subclasses, I have to assume that is why
> CharSequence was never modified.
>
> The Lambda project for Java has been working on closures. They've also
> proposed extension methods/defender methods to allow Java interfaces to
> be modified. I think the best chance of getting CharSequence modified
> would be through that mechanism when it becomes available.

So, sometime around the establishment of our first Mars colony. Gotcha. :)

Arne Vajhøj

unread,
Feb 4, 2011, 10:30:09 PM2/4/11
to

Micro optimization in the meaning it has among programmers.

Arne

Arne Vajhøj

unread,
Feb 4, 2011, 10:32:56 PM2/4/11
to

Java 8 with lambda are planned for 2012.

I believe current plan for first manned mission to Mars
by NASA is mid-2030s.

Arne

Ken Wesson

unread,
Feb 4, 2011, 10:47:29 PM2/4/11
to

Teehee. It's already 2011 and Java 7 still hasn't been released. The time
since Java 6 is getting up to several years already. It's more likely
we'll have half a dozen new Star Wars movies by the end of 2012 than Java
8.

Ken Wesson

unread,
Feb 4, 2011, 10:48:24 PM2/4/11
to
On Fri, 04 Feb 2011 22:30:09 -0500, Arne Vajhøj wrote:

> On 04-02-2011 22:21, Ken Wesson wrote:
>> On Fri, 04 Feb 2011 17:02:37 -0800, Mike Schilling wrote:
>>
>>> "Arne Vajhøj"<ar...@vajhoej.dk> wrote in message
>>> news:4d4ca055$0$23765$1472...@news.sunsite.dk...

>>>> Which is a bit surprising since high code points were introduced when
>>>> .NET came around.
>>>>
>>>> But they probably had a compatibility issue with p/Invoke and Win32
>>>> API, COM interop, C++ mixed mode etc. that all had to work with
>>>> existing Win32 model of 16 bit wchars.
>>>
>>> Or, relentless micro-optimizers that they are, Microsoft wasn't
>>> willing to bite off the size/performance issues.
>>
>> Relentless micro-optimizers of what, their cashflow?
>
> Micro optimization in the meaning it has among programmers.

Microsoft doesn't know beans about optimizing program code, as anyone
who's waited five mintues for Windows to start up and become fully
responsive can attest. :)

Joshua Cranmer

unread,
Feb 4, 2011, 11:04:13 PM2/4/11
to

I don't know--there's got to be someone who's computed dynamic usage
patterns, though. If I weren't so swamped right now, I'd hop on the
online databases and go looking for a paper on this.

Mike Schilling

unread,
Feb 5, 2011, 12:46:56 AM2/5/11
to

"Ken Wesson" <kwe...@gmail.com> wrote in message
news:4d4cc888$1...@news.x-privat.org...

Optimization and micro-optimization are not synonyms.
>

Ken Wesson

unread,
Feb 5, 2011, 1:15:15 AM2/5/11
to

No; the latter is a special case of the former.

So?

Mike Schilling

unread,
Feb 5, 2011, 1:28:57 AM2/5/11
to

"Ken Wesson" <kwe...@gmail.com> wrote in message

news:4d4ceaf3$1...@news.x-privat.org...

The latter is quite often the direct opposite of the former.

Ken Wesson

unread,
Feb 5, 2011, 2:15:31 AM2/5/11
to

If so, then it has been mis-named and you should clarify your original
meaning.

Martin Gregorie

unread,
Feb 5, 2011, 8:09:59 AM2/5/11
to
On Sat, 05 Feb 2011 12:54:06 +1300, Lawrence D'Oliveiro wrote:

> And Arabic script was adopted by a whole lot of different languages
> which had sounds that Arabic did not. So they had to make up their own
> letters, most commonly by adding different numbers of dots to the
> existing shapes.
>

Arabic Letters also have different glyphs depending on whether they are
at the start, middle or end of a word or an isolated letter, though six
letters only have isolated and end-of-word representations. Unicode
supports this with a code point for each representation of each letter.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

Lew

unread,
Feb 5, 2011, 8:58:50 AM2/5/11
to
Mike Schilling wrote:
>>> Optimization and micro-optimization are not synonyms.

"Ken Wesson" wrote:
>> No; the latter is a special case of the former.

Mike Schilling wrote:
> The latter is quite often the direct opposite of the former.

All one needs to do is consider the definitions of the terms, at least if one
is speaking to one who is reasonable. "Optimization" is performance
improvement of a program or system. "Micro-optimization" is the attempt to
optimize tiny portions of the program or system without regard for the overall
effect.

As for the one whom you quoted, I can't see their post since I plonked them
long since, but I shall give them the benefit of the doubt and assume they
provided logic and reason to support their claim and didn't just make the one
bald and incorrect statement.

Hopefully by now they realize that just because the word "optimization" is in
both terms that that doesn't imply any degree of synonymity. The word
"micro-optimization" was coined specifically to contrast it with actual
optimization and was never anything other than a condemnatory term.

--
Lew
Ceci n'est pas une fenêtre.
.___________.
|###] | [###|
|##/ | *\##|
|#/ * | \#|
|#----|----#|
|| | * ||
|o * | o|
|_____|_____|
|===========|

javax.swing.JSnarker

unread,
Feb 5, 2011, 12:59:21 PM2/5/11
to

The Arabic I've seen has always looked like it's in something of a
cursive style, so this may be because each letter may have a connection
to the previous, the next, both, or neither. The four variants probably
look similar to one another except for these connections, then.

The six letters with only two representations are interesting in that
light. Are they not valid letters in any other position in a word than
as last character then? Are they tags that modify a word, say, to give
it a gender or make it plural? Or something else?

--
In <iijn58$ccs$1...@news.albasani.net>, Lew admitted:
> The JLS is obfuscatory in parts

Arne Vajhøj

unread,
Feb 5, 2011, 1:32:44 PM2/5/11
to

Maybe they micro optimized too much and optimized too little.

Arne

Arne Vajhøj

unread,
Feb 5, 2011, 1:33:32 PM2/5/11
to

Not really.

Arne


Arne Vajhøj

unread,
Feb 5, 2011, 1:34:26 PM2/5/11
to

Many things has names that does not really makes much sense.

But the names stick anyway.

Arne

Martin Gregorie

unread,
Feb 5, 2011, 7:33:33 PM2/5/11
to
On Sat, 05 Feb 2011 12:59:21 -0500, javax.swing.JSnarker wrote:

> The Arabic I've seen has always looked like it's in something of a
> cursive style, so this may be because each letter may have a connection
> to the previous, the next, both, or neither.
>

You're right - Arabic is cursive in both hand-written and typeset forms.
Where appropriate, glyphs have connectors that match the next letter, so
the isolated form has no connectors, the beginning letter style only has
a following connector, the end style only has a leading connector and the
middle style has both.

> The four variants probably
> look similar to one another except for these connections, then.
>

Not necessarily so, but see below for more information about that.

> The six letters with only two representations are interesting in that
> light. Are they not valid letters in any other position in a word than
> as last character then?
>

Correct, because they always force the next letter into isolated style.
But remember that the end letter in a word is on the left end because
Arabic script is written right-to-left except for the numbers, which are
written the same as us, with the most significant digit on the left. When
I've watched Arabic writers at work they write right to left as you'd
expect until they come to a number, which they write left to right before
continuing with the rest of the sentence. It must take a lot of practise,
because the distance they move left before starting to write the number
always seems to be spot on.

> Are they tags that modify a word, say, to give it a gender or make
> it plural? Or something else?
>

Pass - I don't speak or read Arabic apart from numbers though I've
travelled and worked in places where Arabic scripts are the norm. Arabic
has the reputation of being one of the hardest languages to learn,
because each word has many shades of meaning and the context defines
exactly what a word means.

I've known for a long time that Arabic letters had different glyphs
depending on the position of the letter in a word. I checked my memory
against this page http://en.wikipedia.org/wiki/Arabic_alphabet before
making my initial post in this thread, which is where I found out about
the six anomalous letters.

Heres the deal for numerals:
http://en.wikipedia.org/wiki/Hindu-Arabic_numeral_system
and the ordering of digits was originally defined by the Indians and
adopted unchanged by the Arabs, who it turn passed it on the Europe. As
you may have guessed, Hindi scripts are written left to right.

Lawrence D'Oliveiro

unread,
Feb 5, 2011, 9:27:27 PM2/5/11
to
In message <iiji77$4fe$1...@localhost.localdomain>, Martin Gregorie wrote:

> Arabic Letters also have different glyphs depending on whether they are
> at the start, middle or end of a word or an isolated letter, though six
> letters only have isolated and end-of-word representations. Unicode
> supports this with a code point for each representation of each letter.

But they are not different characters, they should not have different code
points.

Assigning different code points greatly complicates basic text-processing
tasks like editing and searching.

Mike Schilling

unread,
Feb 5, 2011, 9:46:37 PM2/5/11
to

"Lawrence D'Oliveiro" <l...@geek-central.gen.new_zealand> wrote in message
news:iil0uf$niu$4...@lust.ihug.co.nz...

Different code point for capitals and lower-case letters is equally silly.

Ken Wesson

unread,
Feb 6, 2011, 12:34:24 AM2/6/11
to
On Sat, 05 Feb 2011 08:58:50 -0500, Lew wrote:

> Mike Schilling wrote:
>>>> Optimization and micro-optimization are not synonyms.
>
> "Ken Wesson" wrote:
>>> No; the latter is a special case of the former.
>
> Mike Schilling wrote:
>> The latter is quite often the direct opposite of the former.
>
> All one needs to do is consider the definitions of the terms, at least
> if one is speaking to one who is reasonable.

What's that supposed to mean?

> "Optimization" is performance improvement of a program or system.
> "Micro-optimization" is the attempt to optimize tiny portions of the
> program or system

So far, so good.

> without regard for the overall effect.

Where did you get that from? It doesn't seem inherent in the term "micro-
optimization". I can see that referring to optimizing the few
instructions in the innermost loop of some arithmetic procedure, such as
a matrix multiply, specifically because that's a case where it will
probably optimize the program as a whole, as well as to optimize stuff
needlessly (but mainly harmlessly) such as using a loop that counts down
from 10 instead of up from 0 when printing ten messages (testing against
zero is often faster than testing against a nonzero value, making the
loop slightly quicker; but the entire task is I/O bound) and to optimize
stuff heedlessly, where it might slow other things down (say, making an
algorithm blazing fast for small to moderate N in a memory-consuming way,
and then the whole thing bogs down with disk paging for real-world
problem sizes).

> As for the one whom you quoted, I can't see their post since I plonked
> them long since

Well, you claimed to do so, but after you did you penned at least one
more direct followup to one of my posts.

> but I shall give them the benefit of the doubt and
> assume they provided logic and reason to support their claim and didn't
> just make the one bald and incorrect statement.

I didn't make *any* bald and incorrect statement. I may, however, be
using a simple, declaratively transparent meaning for "micro-optimize"
where someone else intended an additional attached connotation that was
not made clear. That is not the same thing.

> Hopefully by now they realize that just because the word "optimization"
> is in both terms that that doesn't imply any degree of synonymity. The
> word "micro-optimization" was coined specifically to contrast it with
> actual optimization and was never anything other than a condemnatory
> term.

Have you any evidence for this remarkable claim? I've just produced a
perfectly reasonable defintion for "micro-optimization" that's value-
neutral. It stands to reason that this, or something close, would be
"the" definition and whether one intended to refer merely to optimizing
in the small (a single loop, a single data structure) or specifically to
contrast it (optimizing in a narrow-sighted, missing-the-forest-for-the-
trees way) would have to be decided by context.

Of course, one could also argue that finding the word "Microsoft" in the
context suffices to disambiguate. :)

Oh, and what's with using "they" as, presumably, a gender neutral
singular to refer to me? My name is clearly male, is it not?

Ken Wesson

unread,
Feb 6, 2011, 12:36:54 AM2/6/11
to

Optimization: making something better.

suggests

Micro-optimization: making some tiny bit of something (a single loop, a
single algorithm or data structure) better.

No doubt the latter can carry connotations, including negative
(optimizing a tiny bit of something without considering the forest, only
the trees), in certain contexts, but the post that first used the term in
this thread did not IMO clearly convey any such subtext.

Arne Vajhøj

unread,
Feb 6, 2011, 9:25:06 AM2/6/11
to

That is what the words means.

But among programmers it has gotten a slightly different meaning.

> No doubt the latter can carry connotations, including negative
> (optimizing a tiny bit of something without considering the forest, only
> the trees), in certain contexts, but the post that first used the term in
> this thread did not IMO clearly convey any such subtext.

It seems rather clear to me that Mike's usage had the programmer
meaning not the English meaning in mind.

Arne

Tom Anderson

unread,
Feb 6, 2011, 10:42:55 AM2/6/11
to

Agreed. Uppercase should be a combiner, like an accent. There could be
composed forms of uppercase letters, but there should be a modifier too,
so that when normalised in the right direction, searching and sorting are
simplified.

tom

--
Demolish serious culture!

Arne Vajhøj

unread,
Feb 6, 2011, 11:02:58 AM2/6/11
to

It would make some things a lot easier. But guess the idea
is 40-50 years late.

Arne

Lew

unread,
Feb 6, 2011, 11:15:05 AM2/6/11
to
Ken Wesson wrote:
>> Micro-optimization: making some tiny bit of something (a single loop, a
>> single algorithm or data structure) better.

Which is opposite to optimization in many cases. "Better" in
micro-optimization is frequently worse for the system.

. . .

Ken Wesson wrote:
>> No doubt the latter can carry connotations, including negative
>> (optimizing a tiny bit of something without considering the forest, only
>> the trees), in certain contexts, but the post that first used the term in
>> this thread did not IMO clearly convey any such subtext.

Arne Vajhøj wrote:
> It seems rather clear to me that Mike's usage had the programmer
> meaning not the English meaning in mind.

It's obvious to anyone who isn't trolling.

Tom Anderson

unread,
Feb 6, 2011, 1:34:58 PM2/6/11
to

For interchange purposes, yes. But i don't see why you couldn't write a
string implementation that stored characters this way internally.

tom

--
peason pla me his oasis rekords it rappidly become klere that like
history oasis started badley and got steaddily WORSE -- Nigel Molesworth

Arne Vajhøj

unread,
Feb 6, 2011, 3:38:20 PM2/6/11
to

24 bit for values and 8 bit for modifiers.

I guess it could be done.

Arne

Ken Wesson

unread,
Feb 6, 2011, 10:15:50 PM2/6/11
to
On Sun, 06 Feb 2011 11:15:05 -0500, Lew wrote:

> Ken Wesson wrote:
>>> Micro-optimization: making some tiny bit of something (a single loop,
>>> a single algorithm or data structure) better.
>
> Which is opposite to optimization in many cases. "Better" in
> micro-optimization is frequently worse for the system.

That depends, doesn't it? Frequently != always.

> Ken Wesson wrote:
>>> No doubt the latter can carry connotations, including negative
>>> (optimizing a tiny bit of something without considering the forest,
>>> only the trees), in certain contexts, but the post that first used the
>>> term in this thread did not IMO clearly convey any such subtext.
>
> Arne Vajhøj wrote:
>> It seems rather clear to me that Mike's usage had the programmer
>> meaning not the English meaning in mind.
>
> It's obvious to anyone who isn't trolling.

Nobody here is trolling, and it doesn't accomplish anything useful to
start throwing the accusation around. It just encourages some of the
people arguing to stop being reasonable if they expect they won't get a
fair hearing or be listened to anyway.

Actually I'm not even sure why there's an argument here. I don't think we
disagree on the denotative content of "micro-optimization" and I don't
think we disagree that it can be at the expense of optimization in the
large, or that in some contexts it can be intended to imply such a case.

The only real bone of contention seems to be "how clear was it that Mike
meant it that way in <iii7jc$ssv$1...@news.eternal-september.org>?" And I
don't think that's very important anymore. Certainly not important enough
to be making posts implying that various people are trolls, or not
"really" programmers, or similarly over it.

Mike Schilling

unread,
Feb 6, 2011, 11:53:29 PM2/6/11
to
"Ken Wesson" <kwe...@gmail.com> wrote in message
news:4d4f63e6$1...@news.x-privat.org...

> The only real bone of contention seems to be "how clear was it that Mike
> meant it that way in <iii7jc$ssv$1...@news.eternal-september.org>?" And I
> don't think that's very important anymore. Certainly not important enough
> to be making posts implying that various people are trolls, or not
> "really" programmers, or similarly over it.

I meant it disparagingly, and that was clear, it seems, to people who know
the way I tend to express myself. I see that it wasn't clear to you at
first, but I trust it is now.

Arne Vajhøj

unread,
Feb 7, 2011, 6:26:21 PM2/7/11
to

I find it difficult to imagine "micro optimization" even being
used in a forum like this as something positive.

Arne

Lew

unread,
Feb 7, 2011, 7:16:35 PM2/7/11
to

The whole, entire, universal point of the term "micro-optimization" is to
disparage foolish attempts to optimize that have the opposite effect. I am
simply astounded that "Ken Wesson" would pick a fight over something so basic
and universally well understood in the industry. Especially after several
people have set him straight! What reason in the world can someone have to
argue with the answers universally provided by knowledgeable professionals in
response to his question?

(Hint: It rhymes with the second syllable of "control".)

Ken Wesson

unread,
Feb 8, 2011, 5:10:50 AM2/8/11
to
On Mon, 07 Feb 2011 19:16:35 -0500, Lew wrote:

> On 02/07/2011 06:26 PM, Arne Vajhøj wrote:
>> On 06-02-2011 23:53, Mike Schilling wrote:
>>> "Ken Wesson" <kwe...@gmail.com> wrote in message
>>> news:4d4f63e6$1...@news.x-privat.org...
>>>> The only real bone of contention seems to be "how clear was it that
>>>> Mike meant it that way in <iii7jc$ssv$1...@news.eternal-september.org>?"
>>>> And I don't think that's very important anymore. Certainly not
>>>> important enough to be making posts implying that various people are
>>>> trolls, or not "really" programmers, or similarly over it.
>>>
>>> I meant it disparagingly, and that was clear, it seems, to people who
>>> know the way I tend to express myself. I see that it wasn't clear to
>>> you at first, but I trust it is now.
>>
>> I find it difficult to imagine "micro optimization" even being used in
>> a forum like this as something positive.

I don't. As I said, it's easy to imagine it being applied in a neutral
way to an optimization of a small, specific bit of code, such as a
particular inner loop.

> The whole, entire, universal point of the term "micro-optimization" is
> to disparage foolish attempts to optimize that have the opposite effect.

So you say.

> I am simply astounded that "Ken Wesson" would pick a fight over
> something so basic and universally well understood in the industry.

Obviously, your experience with "the industry" differs from mine.

Furthermore, I did not pick a fight. I observed that the only thing
Microsoft is known for optimizing is the size of their cashflow (<4d4cc253
$1...@news.x-privat.org>). The only one that might be picking a fight with
is Microsoft. Yet Arne responded in a manner seemingly intended to
disparage me.

Unless Arne takes it personally when people disparage Microsoft, then, it
looks an awful lot like Arne was the one picking fights here.



> Especially after several people have set him straight!

Nobody has "set me straight". One of you has clarified that the original
use of "micro-optimization" was intended to disparage rather than to be
neutral. There is a difference between "clarified" and "set someone
straight". The difference is the same as someone having allowed for
several possibilities, one of which turned out to be the case, and
someone having only allowed for one possibility and having been wrong.
You seem to think the latter occurred but it was the former, as I have
said several times now.

> What reason in the world can someone have to argue with the answers
> universally provided by knowledgeable professionals in response to his
> question?

This is not even meaningful in this context, because I hadn't asked a
question.

> (Hint: It rhymes with the second syllable of "control".)

And that is simply not constructive. Looks like Arne isn't the only one
spoiling for a fight.

Too bad for both of you, I'm not interested. I refuse to stoop to your
level, which is barely above explicit name-calling.

We obviously have had different patterns of experience in our encounters
with the term under discussion. That is not a rational basis for either
side disparaging the other. Let's drop this noisome subject and move on.

Arne Vajhøj

unread,
Feb 8, 2011, 10:51:23 PM2/8/11
to
On 08-02-2011 05:10, Ken Wesson wrote:
> On Mon, 07 Feb 2011 19:16:35 -0500, Lew wrote:
>> Especially after several people have set him straight!
>
> Nobody has "set me straight". One of you has clarified that the original
> use of "micro-optimization" was intended to disparage rather than to be
> neutral. There is a difference between "clarified" and "set someone
> straight". The difference is the same as someone having allowed for
> several possibilities, one of which turned out to be the case, and
> someone having only allowed for one possibility and having been wrong.
> You seem to think the latter occurred but it was the former, as I have
> said several times now.

That all depends.

If you are a programmer is´t was "set me straight".

If you are ordinary word processor and spreadsheet user then
it is "clarified".

Given that this is cljp, then the first was assumed.

>> What reason in the world can someone have to argue with the answers
>> universally provided by knowledgeable professionals in response to his
>> question?
>
> This is not even meaningful in this context, because I hadn't asked a
> question.

You did not post anything intended as a question.

But posting misunderstandings about programming terminology
in a programming group tend to be considered an implicit question
to be answered.

Arne

Lawrence D'Oliveiro

unread,
Feb 9, 2011, 8:00:05 PM2/9/11
to
In message <4d4c88f3$0$23753$1472...@news.sunsite.dk>, Arne Vajhøj wrote:

> On 04-02-2011 17:45, Lawrence D'Oliveiro wrote:
>
>> In message<ejeok6d6v98ju1tpq...@4ax.com>, Roedy Green
>> wrote:
>>
>>> Personally, I don’t see the point of any great rush to support 32-bit
>>> Unicode. ... The rest I can’t imagine ever using unless I took up a
>>> career in anthropology ...
>>
>> But you, or another programmer, might work for an anthropologist. The
>> computer is a universal machine, after all. If a programming language
>> can’t support that universality, what good is it?
>
> The idea that a single programming language needs to support
> everything is not a good one.

But if it’s going to support text processing, then it should support what is
commonly accepted as the minimum requirements for that area. The incremental
cost of adding lesser-used language scripts on top of the more common ones
is so low, it seems entirely reasonable to insist that you should at least
have provision for dealing with them all.

> If they had new that Unicode would go beyond 64K, then they
> probably would have come up with a different solution.

The problem, I think, is that they embraced Unicode before it had properly
stabilized.

Arne Vajhøj

unread,
Feb 9, 2011, 8:12:40 PM2/9/11
to
On 09-02-2011 20:00, Lawrence D'Oliveiro wrote:

> In message<4d4c88f3$0$23753$1472...@news.sunsite.dk>, Arne Vajhøj wrote:
>> On 04-02-2011 17:45, Lawrence D'Oliveiro wrote:
>>> In message<ejeok6d6v98ju1tpq...@4ax.com>, Roedy Green
>>> wrote:
>>>> Personally, I don’t see the point of any great rush to support 32-bit
>>>> Unicode. ... The rest I can’t imagine ever using unless I took up a
>>>> career in anthropology ...
>>>
>>> But you, or another programmer, might work for an anthropologist. The
>>> computer is a universal machine, after all. If a programming language
>>> can’t support that universality, what good is it?
>>
>> The idea that a single programming language needs to support
>> everything is not a good one.
>
> But if it’s going to support text processing, then it should support what is
> commonly accepted as the minimum requirements for that area. The incremental
> cost of adding lesser-used language scripts on top of the more common ones
> is so low, it seems entirely reasonable to insist that you should at least
> have provision for dealing with them all.

That seems as an obvious good idea.

But none of the common languages for text processing apps
does, so ...

And it is not clear that it is a big problem in practice.

Arne

Ken Wesson

unread,
Feb 18, 2011, 2:24:02 AM2/18/11
to
On Tue, 08 Feb 2011 22:51:23 -0500, Arne Vajhøj wrote:

> On 08-02-2011 05:10, Ken Wesson wrote:
>> On Mon, 07 Feb 2011 19:16:35 -0500, Lew wrote:
>>> Especially after several people have set him straight!
>>
>> Nobody has "set me straight". One of you has clarified that the
>> original use of "micro-optimization" was intended to disparage rather
>> than to be neutral. There is a difference between "clarified" and "set
>> someone straight". The difference is the same as someone having allowed
>> for several possibilities, one of which turned out to be the case, and
>> someone having only allowed for one possibility and having been wrong.
>> You seem to think the latter occurred but it was the former, as I have
>> said several times now.
>
> That all depends.
>
> If you are a programmer is´t was "set me straight".

No. You once again seem to presume that everyone that is a programmer
never uses the term in any other sense than the negative. That may be
true of the programmers *you know* but it is demonstrably *not* true of
programmers in general -- for one, I do not invariably use it in that
sense and I am a programmer.

>>> What reason in the world can someone have to argue with the answers
>>> universally provided by knowledgeable professionals in response to his
>>> question?
>>
>> This is not even meaningful in this context, because I hadn't asked a
>> question.
>
> You did not post anything intended as a question.

I'm glad you realize that. I was getting worried there for a minute.

> But posting misunderstandings about programming terminology in a
> programming group tend to be considered an implicit question to be
> answered.

First of all, nobody posted "misunderstandings about programming
terminology". Someone may have posted "an unwelcome reminder that Arne's
personal circle of acquaintances and colleagues is not the entirety of
the profession", but that is by no means even close to the same thing.

Second, even posting the former cannot be considered an "implicit
question". The hypothetical poster had no question in mind when he posted
it, so cannot have implied anything of the sort. You might have inferred
it, but that is not the same thing.

And third, with reference to your original statement, it said something
about answers "universally provided by knowledgeable professionals". Yet
the "answers" were provided by a smattering of usenet users using no
particular form of authentication, possibly posting under assumed names;
even if they were all "knowledgeable professionals" it would be passing
arrogant for three or four such individuals to claim any particular
opinion of theirs was universally held by the whole profession, which no
doubt has hundreds of thousands of practitioners if not millions. Even if
they were well-respected and famous experts in the profession posting
verifiably under their real names, rather than relatively unknown members
of it posting unverifiably, it would be questionable for them to claim to
speak on behalf of the whole profession.

Further to that, we know for a fact that at least one of them was clearly
*not* knowledgeable about at least one thing: the existence of a
programmer who did *not* in fact uniformly use "micro-optimization" in
the specifically pejorative sense that has been discussed here.

And that thing seems to be the very thing that is most relevant.

Arne Vajhøj

unread,
Feb 18, 2011, 3:22:39 PM2/18/11
to
On 18-02-2011 02:24, Ken Wesson wrote:
> On Tue, 08 Feb 2011 22:51:23 -0500, Arne Vajhøj wrote:
>> On 08-02-2011 05:10, Ken Wesson wrote:
>>> On Mon, 07 Feb 2011 19:16:35 -0500, Lew wrote:
>>>> Especially after several people have set him straight!
>>>
>>> Nobody has "set me straight". One of you has clarified that the
>>> original use of "micro-optimization" was intended to disparage rather
>>> than to be neutral. There is a difference between "clarified" and "set
>>> someone straight". The difference is the same as someone having allowed
>>> for several possibilities, one of which turned out to be the case, and
>>> someone having only allowed for one possibility and having been wrong.
>>> You seem to think the latter occurred but it was the former, as I have
>>> said several times now.
>>
>> That all depends.
>>
>> If you are a programmer is´t was "set me straight".
>
> No. You once again seem to presume that everyone that is a programmer
> never uses the term in any other sense than the negative. That may be
> true of the programmers *you know* but it is demonstrably *not* true of
> programmers in general -- for one, I do not invariably use it in that
> sense and I am a programmer.

We have not seen any evidence of that so far.

>>>> What reason in the world can someone have to argue with the answers
>>>> universally provided by knowledgeable professionals in response to his
>>>> question?
>>>
>>> This is not even meaningful in this context, because I hadn't asked a
>>> question.
>>
>> You did not post anything intended as a question.
>
> I'm glad you realize that. I was getting worried there for a minute.
>
>> But posting misunderstandings about programming terminology in a
>> programming group tend to be considered an implicit question to be
>> answered.
>
> First of all, nobody posted "misunderstandings about programming
> terminology". Someone may have posted "an unwelcome reminder that Arne's
> personal circle of acquaintances and colleagues is not the entirety of
> the profession", but that is by no means even close to the same thing.

Hint: try and read the other comments or try google. That would
confirm that it is not my personal opinion, but a general
use of the term.

Yeah - I know that you don't want to google. But seeking
information is the only way to learn anything.

> Second, even posting the former cannot be considered an "implicit
> question". The hypothetical poster had no question in mind when he posted
> it, so cannot have implied anything of the sort. You might have inferred
> it, but that is not the same thing.

The fact that you did not intend to imply a question does not change
that it is being considered such.

> And third, with reference to your original statement, it said something
> about answers "universally provided by knowledgeable professionals". Yet
> the "answers" were provided by a smattering of usenet users using no
> particular form of authentication, possibly posting under assumed names;
> even if they were all "knowledgeable professionals" it would be passing
> arrogant for three or four such individuals to claim any particular
> opinion of theirs was universally held by the whole profession, which no
> doubt has hundreds of thousands of practitioners if not millions. Even if
> they were well-respected and famous experts in the profession posting
> verifiably under their real names, rather than relatively unknown members
> of it posting unverifiably, it would be questionable for them to claim to
> speak on behalf of the whole profession.

Given that it is easy to verify by anyone capable of using google,
then ....

> Further to that, we know for a fact that at least one of them was clearly
> *not* knowledgeable about at least one thing: the existence of a
> programmer who did *not* in fact uniformly use "micro-optimization" in
> the specifically pejorative sense that has been discussed here.

I did not see any such.

Arne

Ken Wesson

unread,
Feb 23, 2011, 3:52:01 PM2/23/11
to
On Fri, 18 Feb 2011 15:22:39 -0500, Arne Vajhøj wrote:

> On 18-02-2011 02:24, Ken Wesson wrote:
>> On Tue, 08 Feb 2011 22:51:23 -0500, Arne Vajhøj wrote:
>>> On 08-02-2011 05:10, Ken Wesson wrote:
>>>> On Mon, 07 Feb 2011 19:16:35 -0500, Lew wrote:
>>>>> Especially after several people have set him straight!
>>>>
>>>> Nobody has "set me straight". One of you has clarified that the
>>>> original use of "micro-optimization" was intended to disparage rather
>>>> than to be neutral. There is a difference between "clarified" and
>>>> "set someone straight". The difference is the same as someone having
>>>> allowed for several possibilities, one of which turned out to be the
>>>> case, and someone having only allowed for one possibility and having
>>>> been wrong. You seem to think the latter occurred but it was the
>>>> former, as I have said several times now.
>>>
>>> That all depends.
>>>
>>> If you are a programmer is´t was "set me straight".
>>
>> No. You once again seem to presume that everyone that is a programmer
>> never uses the term in any other sense than the negative. That may be
>> true of the programmers *you know* but it is demonstrably *not* true of
>> programmers in general -- for one, I do not invariably use it in that
>> sense and I am a programmer.
>
> We have not seen any evidence of that so far.

Then you haven't been looking. Try doing a google groups search on my
name sometime.

>> First of all, nobody posted "misunderstandings about programming
>> terminology". Someone may have posted "an unwelcome reminder that
>> Arne's personal circle of acquaintances and colleagues is not the
>> entirety of the profession", but that is by no means even close to the
>> same thing.
>
> Hint: try and read the other comments or try google.

Hint: try and be polite to people if you want them to listen to you.

Hint 2: what you just wrote, wasn't polite.

>> Second, even posting the former cannot be considered an "implicit
>> question". The hypothetical poster had no question in mind when he
>> posted it, so cannot have implied anything of the sort. You might have
>> inferred it, but that is not the same thing.
>
> The fact that you did not intend to imply a question does not change
> that it is being considered such.

Then it is being considered such erroneously.

>> And third, with reference to your original statement, it said something
>> about answers "universally provided by knowledgeable professionals".
>> Yet the "answers" were provided by a smattering of usenet users using
>> no particular form of authentication, possibly posting under assumed
>> names; even if they were all "knowledgeable professionals" it would be
>> passing arrogant for three or four such individuals to claim any
>> particular opinion of theirs was universally held by the whole
>> profession, which no doubt has hundreds of thousands of practitioners
>> if not millions. Even if they were well-respected and famous experts in
>> the profession posting verifiably under their real names, rather than
>> relatively unknown members of it posting unverifiably, it would be
>> questionable for them to claim to speak on behalf of the whole
>> profession.
>
> Given that it is easy to verify by anyone capable of using google, then
> ....

Google is a funny creature. It often favors nonstandard and quirky
meanings above the standard one. For example the build tool "ant" ranks
higher than the insect for the query "ant".

So you cannot use google rankings and results as an ironclad proof of
which usage is actually majority.

>> Further to that, we know for a fact that at least one of them was
>> clearly *not* knowledgeable about at least one thing: the existence of
>> a programmer who did *not* in fact uniformly use "micro-optimization"
>> in the specifically pejorative sense that has been discussed here.
>
> I did not see any such.

Then check your glasses!

Arne Vajhøj

unread,
Feb 24, 2011, 9:28:20 PM2/24/11
to

Why - I have seen lots of posts proving the opposite, so there would
be no point.

>>> First of all, nobody posted "misunderstandings about programming
>>> terminology". Someone may have posted "an unwelcome reminder that
>>> Arne's personal circle of acquaintances and colleagues is not the
>>> entirety of the profession", but that is by no means even close to the
>>> same thing.
>>
>> Hint: try and read the other comments or try google.
>
> Hint: try and be polite to people if you want them to listen to you.

It is really not my problem if you prefer to stay ignorant.

>>> Second, even posting the former cannot be considered an "implicit
>>> question". The hypothetical poster had no question in mind when he
>>> posted it, so cannot have implied anything of the sort. You might have
>>> inferred it, but that is not the same thing.
>>
>> The fact that you did not intend to imply a question does not change
>> that it is being considered such.
>
> Then it is being considered such erroneously.

You are free to claim that gravity is an error as well.

That does not change the facts.

>>> And third, with reference to your original statement, it said something
>>> about answers "universally provided by knowledgeable professionals".
>>> Yet the "answers" were provided by a smattering of usenet users using
>>> no particular form of authentication, possibly posting under assumed
>>> names; even if they were all "knowledgeable professionals" it would be
>>> passing arrogant for three or four such individuals to claim any
>>> particular opinion of theirs was universally held by the whole
>>> profession, which no doubt has hundreds of thousands of practitioners
>>> if not millions. Even if they were well-respected and famous experts in
>>> the profession posting verifiably under their real names, rather than
>>> relatively unknown members of it posting unverifiably, it would be
>>> questionable for them to claim to speak on behalf of the whole
>>> profession.
>>
>> Given that it is easy to verify by anyone capable of using google, then
>> ....
>
> Google is a funny creature. It often favors nonstandard and quirky
> meanings above the standard one. For example the build tool "ant" ranks
> higher than the insect for the query "ant".
>
> So you cannot use google rankings and results as an ironclad proof of
> which usage is actually majority.

Real IT people quickly learn to sort in Google info.

>>> Further to that, we know for a fact that at least one of them was
>>> clearly *not* knowledgeable about at least one thing: the existence of
>>> a programmer who did *not* in fact uniformly use "micro-optimization"
>>> in the specifically pejorative sense that has been discussed here.
>>
>> I did not see any such.
>
> Then check your glasses!

That will not create any such.

Arne

Ken Wesson

unread,
Feb 24, 2011, 11:33:43 PM2/24/11
to

A lie.

>> Hint: try and be polite to people if you want them to listen to you.
>

> ignorant

Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?

>>>> Second, even posting the former cannot be considered an "implicit
>>>> question". The hypothetical poster had no question in mind when he
>>>> posted it, so cannot have implied anything of the sort. You might
>>>> have inferred it, but that is not the same thing.
>>>
>>> The fact that you did not intend to imply a question does not change
>>> that it is being considered such.
>>
>> Then it is being considered such erroneously.
>
> You are free to claim that gravity is an error as well.

But I do not do so. I only claim that something is an error if it
actually is an error.

The fact of the matter is, I neither stated a question nor had one in
mind when I wrote the paragraph that you are erroneously claiming stated
or implied a question.

You may have *inferred* a question. But none was *implied*. Get the
difference?

> That does not change the facts.

Neither does your repeating yourself in 18 mostly-off-topic posts. You
can scream at how horrible, evil, and unprogrammer a person Ken Wesson is
and the universe will refuse to oblige; Ken Wesson will remain a nice,
generally calm and reasonable, programmer no matter how much you scream
and yell and kick your feet.

Maybe you should try holding your breath until you turn blue? That's
usually the next step when faced with something you won't or can't
accept, isn't it? :)

>>> Given that it is easy to verify by anyone capable of using google,
>>> then ....
>>
>> Google is a funny creature. It often favors nonstandard and quirky
>> meanings above the standard one. For example the build tool "ant" ranks
>> higher than the insect for the query "ant".
>>
>> So you cannot use google rankings and results as an ironclad proof of
>> which usage is actually majority.
>
> Real IT people

Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?

You are aware, by the way, that any "refutation" of a technical argument
that begins with "you ..." or "real IT people ..." or "real
programmers ..." is inherently flawed, right? It's called argumentum ad
hominem, Arne.

>>>> Further to that, we know for a fact that at least one of them was
>>>> clearly *not* knowledgeable about at least one thing: the existence
>>>> of a programmer who did *not* in fact uniformly use
>>>> "micro-optimization" in the specifically pejorative sense that has
>>>> been discussed here.
>>>
>>> I did not see any such.
>>
>> Then check your glasses!
>
> That will not create any such.

They will maybe let you see what's right there in front of your nose,
though.

Arne Vajhøj

unread,
Feb 25, 2011, 1:01:18 PM2/25/11
to

True.

>>>>> Second, even posting the former cannot be considered an "implicit
>>>>> question". The hypothetical poster had no question in mind when he
>>>>> posted it, so cannot have implied anything of the sort. You might
>>>>> have inferred it, but that is not the same thing.
>>>>
>>>> The fact that you did not intend to imply a question does not change
>>>> that it is being considered such.
>>>
>>> Then it is being considered such erroneously.
>>
>> You are free to claim that gravity is an error as well.
>
> But I do not do so. I only claim that something is an error if it
> actually is an error.
>
> The fact of the matter is, I neither stated a question nor had one in
> mind when I wrote the paragraph that you are erroneously claiming stated
> or implied a question.
>
> You may have *inferred* a question. But none was *implied*. Get the
> difference?

Learn to read and understand English.

"it is being considered" is not impacted by your intentions.

>>>>> Further to that, we know for a fact that at least one of them was
>>>>> clearly *not* knowledgeable about at least one thing: the existence
>>>>> of a programmer who did *not* in fact uniformly use
>>>>> "micro-optimization" in the specifically pejorative sense that has
>>>>> been discussed here.
>>>>
>>>> I did not see any such.
>>>
>>> Then check your glasses!
>>
>> That will not create any such.
>
> They will maybe let you see what's right there in front of your nose,
> though.

There are dozens of posts showing that:
- you don't know Java
- you don't know char sets
- you don't know OS'es
- you don't know software engineering practices

Arne

Ken Wesson

unread,
Feb 26, 2011, 4:01:54 AM2/26/11
to
On Fri, 25 Feb 2011 13:01:18 -0500, Arne Vajhøj spammed:

> On 24-02-2011 23:33, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:28:20 -0500, Arne Vajhøj wrote:
>>> On 23-02-2011 15:52, Ken Wesson wrote:
>>>> On Fri, 18 Feb 2011 15:22:39 -0500, Arne Vajhøj wrote:
>>>>> On 18-02-2011 02:24, Ken Wesson wrote:
>>>>>> No. You once again seem to presume that everyone that is a
>>>>>> programmer never uses the term in any other sense than the
>>>>>> negative. That may be true of the programmers *you know* but it is
>>>>>> demonstrably *not* true of programmers in general -- for one, I do
>>>>>> not invariably use it in that sense and I am a programmer.
>>>>>
>>>>> We have not seen any evidence of that so far.
>>>>
>>>> Then you haven't been looking. Try doing a google groups search on my
>>>> name sometime.
>>>
>>> Why - I have seen lots of posts proving the opposite
>>
>> A lie.
>
> True.

So, you admit your lie.

Well, they do say that admitting it is the first step.

Meanwhile, though, you do definitely have a problem. 21 compulsive flames
today. That's three more than last time. Anyone who was tossing down 21
bottles of Scotch a day would be in serious need of rehab, if not medical
care. You really should at least cut down, especially when the 21 posts
all say essentially the same thing.

Indeed, if it gets much higher you'll be in violation of most
newsservers' terms of service because your Breidbart Index will pop 25 --
post 26 posts in one day whose sole purpose is to repeat the same litany
of irrational anti-Wesson beliefs and you will risk losing your account.

>>>>>> Second, even posting the former cannot be considered an "implicit
>>>>>> question". The hypothetical poster had no question in mind when he
>>>>>> posted it, so cannot have implied anything of the sort. You might
>>>>>> have inferred it, but that is not the same thing.
>>>>>
>>>>> The fact that you did not intend to imply a question does not change
>>>>> that it is being considered such.
>>>>
>>>> Then it is being considered such erroneously.
>>>
>>> You are free to claim that gravity is an error as well.
>>
>> But I do not do so. I only claim that something is an error if it
>> actually is an error.
>>
>> The fact of the matter is, I neither stated a question nor had one in
>> mind when I wrote the paragraph that you are erroneously claiming
>> stated or implied a question.
>>
>> You may have *inferred* a question. But none was *implied*. Get the
>> difference?
>
> Learn to read and understand English.

Thank you, I already have, and consequently I, unlike you, understand the
difference between implied and inferred. *Implied* means the writer
intended a certain meaning that they did not state outright. *Inferred*
means the reader interpreted a certain meaning, whether or not that
meaning was intended by the writer.

http://www.thefreedictionary.com/infer

Usage Note: Infer is sometimes confused with imply, but the
distinction is a useful one. When we say that a speaker or sentence
implies something, we mean that it is conveyed or suggested without
being stated outright: When the mayor said that she would not rule
out a business tax increase, she implied (not inferred) that some
taxes might be raised. Inference, on the other hand, is the activity
performed by a reader or interpreter in drawing conclusions that are
not explicit in what is said: When the mayor said that she would not
rule out a tax increase, we inferred that she had been consulting
with some new financial advisers, since her old advisers were in
favor of tax reductions.

> "it is being considered" is not impacted by your intentions.

Perhaps not, but "it is implied" is. You inferred it incorrectly and now
you are desperate to justify your position after I have stated that you
did so.

But the fact is, I am *inherently* the sole arbiter of what I did and did
not intend to convey, and I intended no question. Your claim otherwise is
simply false, and consequently any "considering" or "inferring" a
question on my part is simply incorrect -- seeing something that wasn't
there. Claiming I "implied" it goes even further to false attribution,
compounding your error. And of course continuing to claim I implied it
after I've explicitly stated that I did not goes further still, to out-
and-out lying. At that point you're guilty not merely of factual but also
moral error.

>> They will maybe let you see what's right there in front of your nose,
>> though.
>

> you don't know
> you don't know
> you don't know
> you don't know

Arne Vajhøj

unread,
Feb 26, 2011, 4:23:04 PM2/26/11
to
On 26-02-2011 04:01, Ken Wesson wrote:
> On Fri, 25 Feb 2011 13:01:18 -0500, Arne Vajhøj spammed:
>> On 24-02-2011 23:33, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:28:20 -0500, Arne Vajhøj wrote:
>>>> On 23-02-2011 15:52, Ken Wesson wrote:
>>>>> On Fri, 18 Feb 2011 15:22:39 -0500, Arne Vajhøj wrote:
>>>>>> On 18-02-2011 02:24, Ken Wesson wrote:
>>>>>>> No. You once again seem to presume that everyone that is a
>>>>>>> programmer never uses the term in any other sense than the
>>>>>>> negative. That may be true of the programmers *you know* but it is
>>>>>>> demonstrably *not* true of programmers in general -- for one, I do
>>>>>>> not invariably use it in that sense and I am a programmer.
>>>>>>
>>>>>> We have not seen any evidence of that so far.
>>>>>
>>>>> Then you haven't been looking. Try doing a google groups search on my
>>>>> name sometime.
>>>>
>>>> Why - I have seen lots of posts proving the opposite
>>>
>>> A lie.
>>
>> True.
>
> So, you admit your lie.

No.

It was about my claim not about your claim.

> Indeed, if it gets much higher you'll be in violation of most
> newsservers' terms of service because your Breidbart Index will pop 25 --
> post 26 posts in one day whose sole purpose is to repeat the same litany
> of irrational anti-Wesson beliefs and you will risk losing your account.

If you bothered to read and understand the term you use then
you would know that the BI of almost all my posts were 1.

But then reading and understnding has never been you strong interest.

> Perhaps not, but "it is implied" is.You inferred it incorrectly and now


> you are desperate to justify your position after I have stated that you
> did so.
>
> But the fact is, I am *inherently* the sole arbiter of what I did and did
> not intend to convey, and I intended no question.

You mean that you still did not understand:

#"it is being considered" is not impacted by your intentions.

Arne

Ken Wesson

unread,
Feb 27, 2011, 8:01:37 AM2/27/11
to
On Sat, 26 Feb 2011 16:23:04 -0500, Arne Vajhøj spammed:

> On 26-02-2011 04:01, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 13:01:18 -0500, Arne Vajhøj spammed:
>>> On 24-02-2011 23:33, Ken Wesson wrote:

<snip>

Boy, you sure are a glutton for punishment, Arne. But I see you are at
least a *little* bit concerned about your BI. Only 18 posts that boil
down to "Ken is an evil horrible person" this time instead of almost 25.

Still, that's 18 more than you *should* have posted.

>>>>> Why - I have seen lots of posts proving the opposite
>>>>
>>>> A lie.
>>>
>>> True.
>>
>> So, you admit your lie.
>
> No.

But ... but ... but you just did! You didn't even trim your admission
from the quoted text -- not that it would have done you much good, since
I'm personally archiving this entire thread and no doubt Google and other
organizations also are.

>> Indeed, if it gets much higher you'll be in violation of most
>> newsservers' terms of service because your Breidbart Index will pop 25
>> -- post 26 posts in one day whose sole purpose is to repeat the same
>> litany of irrational anti-Wesson beliefs and you will risk losing your
>> account.
>

> If you bothered to bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark!
>
> But then bark bark bark bark bark bark bark bark bark bark bark!

Do let me know when you've gotten all of that out of your system and you
have something Java-related, or at least on *some* comp-sci/technical
topic, to say.

>>> "it is being considered" is not impacted by your intentions.
>>
>> Perhaps not, but "it is implied" is.You inferred it incorrectly and now
>> you are desperate to justify your position after I have stated that you
>> did so.
>>
>> But the fact is, I am *inherently* the sole arbiter of what I did and
>> did not intend to convey, and I intended no question.
>

> You bark bark bark bark bark bark bark bark bark bark!

What is it, Arne? Bird perched on the window sill? Neighbor brazenly
walking his dog on *your* length of sidewalk despite the clearly marked
fire hydrants at both ends? Do enlighten us what sets off these fits of
yours.

> #"it is being considered" is not impacted by your intentions.

No, but its *correctness* is.

Arne Vajhøj

unread,
Feb 27, 2011, 9:06:06 AM2/27/11
to
On 27-02-2011 08:01, Ken Wesson wrote:
> On Sat, 26 Feb 2011 16:23:04 -0500, Arne Vajhøj spammed:
>> On 26-02-2011 04:01, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 13:01:18 -0500, Arne Vajhøj spammed:
>>>> On 24-02-2011 23:33, Ken Wesson wrote:
>>>>>> Why - I have seen lots of posts proving the opposite
>>>>>
>>>>> A lie.
>>>>
>>>> True.
>>>
>>> So, you admit your lie.
>>
>> No.
>>
>> It was about my claim not about your claim.
>
> But ... but ... but you just did!

Try read again.

> You didn't even trim your admission

No - I leave those childish trimmings to you.

>> #"it is being considered" is not impacted by your intentions.
>
> No, but its *correctness* is.

No.

If people do consider such (and it should been obvious by now
that they do), then it is correct no matter what your intentions
were.

Arne

Ken Wesson

unread,
Feb 27, 2011, 10:05:09 AM2/27/11
to
On Sun, 27 Feb 2011 09:06:06 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:01, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:23:04 -0500, Arne Vajhøj spammed:
>>> On 26-02-2011 04:01, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 13:01:18 -0500, Arne Vajhøj spammed:
>>>>> On 24-02-2011 23:33, Ken Wesson wrote:
>>>>>>> Why - I have seen lots of posts proving the opposite
>>>>>>
>>>>>> A lie.
>>>>>
>>>>> True.
>>>>
>>>> So, you admit your lie.
>>>
>>> No.
>>

>> But ... but ... but you just did!
>

> Bark bark bark!

Please shut up.

>> You didn't even trim your admission
>

> Bark bark bark bark bark bark bark bark bark!

Please shut up.

>>> #"it is being considered" is not impacted by your intentions.
>>
>> No, but its *correctness* is.
>
> No.

If someone considers something I write to be a question, when that was
not my intention, then they are considering it incorrectly. If I say "up"
and someone misreads it as meaning "down", then surely you agree that
*that* is incorrect? Same principle applies.

> If people do consider such (and it should been obvious by now that they
> do), then it is correct

No, it isn't. By your definition, misunderstandings are literally
impossible, because however anyone interprets something is automatically
correct just because they interpreted it that way! That way lies madness,
Arne! What is the *matter* with you???

Arne Vajhøj

unread,
Feb 27, 2011, 4:19:50 PM2/27/11
to
On 27-02-2011 10:05, Ken Wesson wrote:
> On Sun, 27 Feb 2011 09:06:06 -0500, Arne Vajhøj wrote:
>> On 27-02-2011 08:01, Ken Wesson wrote:
>>> On Sat, 26 Feb 2011 16:23:04 -0500, Arne Vajhøj spammed:
>>>> #"it is being considered" is not impacted by your intentions.
>>>
>>> No, but its *correctness* is.
>>
>> No.
>
> If someone considers something I write to be a question, when that was
> not my intention, then they are considering it incorrectly. If I say "up"
> and someone misreads it as meaning "down", then surely you agree that
> *that* is incorrect? Same principle applies.
>
>> If people do consider such (and it should been obvious by now that they
>> do), then it is correct
>
> No, it isn't. By your definition, misunderstandings are literally
> impossible, because however anyone interprets something is automatically
> correct just because they interpreted it that way! That way lies madness,

We are not misunderstanding you. We know that it was not your
intention, but we still consider it to be a question to
be answered.

Arne

Ken Wesson

unread,
Mar 11, 2011, 9:39:06 PM3/11/11
to
On Sun, 27 Feb 2011 16:19:50 -0500, Arne Vajhøj wrote:

> We are not misunderstanding you.

Really? Then why are you misattributing questions to me that I did not
ask? Moreover, why are you treating me in a manner that is inappropriate
for who I actually am? I consider being treated with undeserved hostility
to be prima facie evidence of having been misunderstood.

0 new messages