Certainly the distinction between A and D registers is a
non-orthogonality. But it is just /one/ case, and it really isn't so
big in practice since you have many identical registers in each class.
It's akin to the difference between GP registers and FP registers you
mention below.
(I am not disagreeing with the remark that the 68000 is not entirely
orthogonal - I am disagreeing with the claim that it is at a similar
level to the 8086. And I am jogging happy memories of old processor
architectures!)
>
> Current machines already have GP and float registers to make things more
> difficult, but here there are separate registers for integers - and
> integers that might be used as memory addresses.
Note that there are very good reasons for separating integer and FP
registers, in terms of hardware implementations. It might be nice to
have them merged from the programmer's viewpoint, but it is not worth
the hardware cost. (A similar logic is behind the separate A and D
registers on the m68k architecture.)
>
> So you would have instructions that operated on one set but not the
> other. You'd need to decide whether functions returned values in D0 or A0.
>
> Glancing at the instruction set now, you have ADD which adds to
> everything except A regs; ADDA which /only/ adds to AREGS.
>
> ADDI which adds immed values to everything except AREGS, and ADDQ which
> adds small values (1..8) to everything /including/ AREGS.
>
> Similarly with ANDI, which works for every dest except AREGS, but there
> is no version for AREGS (so if you were playing with tagged pointers and
> needed to clear the bottom bits then use them for an address, it gets
> awkward).
>
> With a compiler, you had to make decisions on whether it's best to start
> evaluating in DREGS or AREGS and then move across, if it involved mixed
> operations that were only available for one set.
>
Yes, there is no doubt that it is a non-orthogonality. But it is a
minor matter in practice. A simple compiler can decide "pointers go in
A registers, everything else goes in D registers". That's it - done.
(To get the maximum efficiency, you'll need more complex register
allocations.)
In comparison to the 8086, it is /nothing/.
> Note that the 80386 processor, which apparently first appeared in 1985,
> removed many of the restrictions of the 8086, also widening the
> registers but not adding any more. Further, these 32-bit additions and
> new address modes were available while running in 16-bit mode within a
> 16-bit application.
>
Yes, the 80386 helped and removed some of the specialisations of the
8086. There were still plenty left, and still plenty of cases where the
use of particular registers was more efficient than others. The x86
world improved gradually in this way, so that the current x86-64 ISA is
vastly better than the 8086.
>
>> You can also understand it by looking at the processor market. Real
>> CISC with dedicated and specialised registers is dead. In the
>> progress of x86 through 32-bit and then 64-bit, the architecture
>> became more and more orthogonal - the old registers A, B, C, D, SI,
>> DI, etc., are now no more than legacy alternative names for r0, r1,
>> etc., general purpose registers.
>
> What become completely unorthogonal on x86 is the register naming. It's
> a zoo of mismatched names of mixed lengths. The mapping is also bizarre,
> with the stack pointer somewhere below the middle.
Yes.
>
> (However that is easy to fix as I can use my own register names and
> ordering as well as the official names. My 64-bit registers are called
> D0 to D15, with D15 (aka Dstack) being the stack pointer.)
>
I think it is not uncommon to refer to the registers in x86-64 as r0 to
r15 - that is, the A, B, C, D, DI, SI, SP, and BP registers are renamed,
with the extra 8 registers of x86-64 having never had any other name.
>>> i8 i16 i32 i64 i128
>>> u8 u16 u32 u64 u128
>>> f32 f64 f128
>>
>> Or they can be expressed in a form that everyone understands, like
>> "char", "int", etc., that are defined in the ABI, and that everybody
>> and every language /does/ use when integrating between different
>> languages.
>
> Sorry, but C typenames using C syntax are NOT good enough, not for
> cross-language use. You don't really want to see 'int long unsigned
> long'; you want 'uint64' or 'u64'.
Sorry, but they /are/ good enough for everyone else. The world can't be
expected to change to suit /you/ - it is you who must adapt. (But you
don't have to like it!)
>
> Even C decided that `int` `char` were not good enough by adding types
> like `int32_t` and ... sorry I can't even tell you what `char`
> corresponds to. That is how rubbish C type designations are.
These type names were /added/ to the language - they did not replace the
existing types. People use different type names for different purposes.
I write "int" when "int" is appropriate, and "int32_t" when "int32_t"
is appropriate - it's not a case of one set of names being "better" than
the other.
>
>> That document has no mention anywhere of your personal short names for
>> size-specific types.
>
> It uses names of its own like 'unsigned eightbyte' which unequivocally
> describes the type. However you will see `u64` all over forums; you will
> never see `unsigned eightbyte`, and never 'unsigned long long int'
> outside of C forums or actual C code.
>
Standards documents are not everyday language. (I think I've mentioned
that before.) In everyday use, people tend to use shorter and more
convenient names - though they vary how they balance shortness with
explicitness, and that varies by context. (Programs are not everyday
language either.)
>> It has a table stating the type names and sizes.
>
> Yes, that helps too. What doesn't help is just using 'long'.
>
It works fine. You read the table of definitions, see that in this
document the word "long" means "64-bit integer".
Standards documents define all kinds of terms and expressions in a
particular manner that applies only within the document (or other formal
texts that refer to the document).
>> Think of it as just a definition of the technical terms used in the
>> document, no different from when one processor reference might define
>> "word" to mean 16 bits and another define "word" to mean 32 bits.
>
> So defining a dozen variations on 'unsigned long long int` is better
> than just using `u64` or `uint64`?
>
Are you confusing the flexible syntax of C with the technical terms in
the ABI document? It sounds a lot like it.
> That must be the reason why a dozen different languages have all adopted
> those C designations because they work so well and are so succinct and
> unambiguous. Oh, hang on...
>
As I said - it might have been better to have names with explicit sizes.
That does not mean that the C terms are not good enough for the job,
regardless of what language you use. And since in the solid majority of
cases where ABI's are used between two languages, at least one of the
languages is C, it seems sensible to use C terms. Why should Rust users
be forced to learn Go's type names in order to use a C library - when
they need to know the C names anyway? Why should Go users need to learn
the names used by Rust?
Think of C like English - the spelling in English is horrible and
inconsistent, and is different depending on which side of the pond you
live. Yet it works extremely well for international communication, and
lets Bulgarians talk to Koreans. Perhaps Esperanto would be a
hypothetically better language, but it's not going to happen in practice.
>
>
>> How does Google manage case-insensitive searches with text in Unicode
>> in many languages? By being /very/ smart. I didn't say it was
>> impossible to be case-insensitive beyond plain English alphabet, I
>> said it was an "absolute nightmare". It is.
>
>
> No, it really isn't. Now you're making things up. You don't need to be
> very smart at all, it's actually very easy.
>
You can do Unicode case-folding based on a table from the Unicode
people. But I think you'll find Google's search engine is a touch more
advanced than that.
>
>
>> It is done where it has to be done - you'll find all major databases
>> have support for doing sorting, searching, and case translation for
>> large numbers of languages and alphabets. It is a /huge/ undertaking
>> to handle it all. You don't do it if it is not important.
>
> Think about the 'absolute nightmare' if /everything/ was case sensitive
> and a database has 1000 variations of people called 'David Brown'.
> (There could be 130,000 with my name.)
>
> Now imagine talking over the phone to someone, they create an account in
> the name you give them, but they use or omit capitalisation you weren't
> aware of. How would you log in?
>
I have no idea what you are going on about.
Some things in life need to be flexible and deal with variations such as
spelling differences, capitalisation differences, etc.
Other things can and should be precise and unambiguous.
So when programming, you say /exactly/ what you mean. You don't write
"call fooo a few times" and expect it to be obvious to the computer how
many is "a few" and that you really meant "foo". You write "for i = 1
to 5 do foo()", or whatever the language in question expects.
I expect a compiler to demand precision from the code I write.
Accepting muddled letter case is setting the standard too low IMHO - I
want a complaint if I write "foo" one place and "Foo" another. Of
course I can live with such weaknesses in a language, and set higher
standards for my own code than the language allows - I do that for all
coding, as I think most people do. But I see no advantage in having
weak identifier matching in a programming language - it adds nothing to
code readability, allows poor coders to make more of a mess, and
generally allows a totally unnecessary inconsistency.
I see /no/ advantages in being able to write "foo" when defining an
identifier and "Foo" or "FOO" when using it. It is utterly pointless.
(It is a different matter to say that if you have defined an identifier
"foo" then you may not define a separate one written "Foo", disallowing
identifiers that differ only in case. I could appreciate wanting that
as a feature.)
And I cannot see any contradiction between wanting case sensitivity when
writing code while having no cases chatting to a human on the phone.
>
>> Name just /one/ real programming language that supports
>> case-insensitive identifiers
>
> I'm not talking about Unicode identifiers. I wouldn't go there becase
> there are too many issues. For a start, which of the 1.1 million
> characters should be allowed at the beginning, and which within an
> identifier?
>
>> but is not restricted to ASCII. (Let's define "real programming
>> language" as a programming language that has its own Wikipedia entry.)
>>
>> There are countless languages that have case-sensitive Unicode
>> identifiers, because that's easy to implement and useful for programmers.
>
> And also a nightmare, since there are probably 20 distinct characters
> that share the same glyph as 'A'.
>
> Adding Unicode to identifiers is too easy to do badly.
>
It is another case of a feature that can be used or abused. You pick
the balance you want, accepting that either choice is a trade-off.
>
>>
>> Domain names are case insensitive if they are in ASCII.
>
> Because?
Who cares? They are domain names, not program code.
>
>> For other characters, it gets complicated.
>
> So, the same situation with language keywords and commands in CLIs.
>
No, these are case sensitive - except for systems that haven't grown up
since lower case letters were invented.
> But hey, think of the advantage of having Sort and sorT working in
> decreasing/increasing order; no need to specify that separately. Plus
> you have 14 more variations to apply meanings to. Isn't this the point
> of being case-sensitive?
>
> Because if it isn't, then I don't get it. On Windows, I can type 'sort'
> or `SORT`, it doesn't matter. I don't even need to look at the screen or
> have to double-check caps lock.
>
> BTW my languages (2 HLLs and one assembler) use case-insensitive
> identifiers and keywords, but allow case-sensitive names when they are
> sometimes needed, mainly to with working with FFIs.
>
> It really isn't hard at all.
It really isn't hard to write "sort".
>
>> Programmers are not "most people". Programs are not "most things in
>> everyday life".
>>
>> Most people are quite tolerant of spelling mistakes in everyday life -
>> do you think programming languages should be too?
>
> Using case is not a spelling mistake; it's a style. In my languages,
> someone can write 'int', 'Int' or 'INT' according to preference.
>
No, it is a mess.
But of course, it is not a problem in your language - personal
preferences are entirely consistent there.
And in serious languages that are case-insensitive, such as Ada, people
stick strongly to the conventions and write their identifiers with
consistent casing. Which leaves everyone wondering what the point is of
being case-insensitive - it's just a historical mistake that can't be
changed.
> Or that can use CamelCase if they like that, but someone importing such
> a function can just write camelcase if they hate the style.
>
> I use upper case when writing debug code so that I can instantly
> identify it.
>
>
>> They do exist, yes. That does not make them a good idea.
>
> Yes, it does. How do you explain to somebody why using exact case is
> absolutely essential, when it clearly shouldn't matter?
>
>
> Look: I create my own languages, yes? And I could have chosen at any
> time to make them case sensitive, yes?
>
> So why do you think I would choose to make like an 'absolute nightmare'
> for myself?
>
You didn't use Unicode - which is where the implementation gets hard.
There's no difficulty in implementing case insensitive keywords and
identifiers in plain ASCII - there's just no advantage to it (unless you
call being able to write an inconsistent mess an advantage).