caret vs and

rusi

unread,

Oct 28, 2013, 9:04:06 AM10/28/13

to

I was browsing the idiom library (looks like quite a treasure btw!)
and it seems that caret (^) is often used where it should be and ( ∧ )

eg idiom 6,7

rusi

unread,

Oct 28, 2013, 10:09:31 AM10/28/13

to

Not sure if links are being deleted??
I meant this one
aplwiki.com/FinnAplIdiomLibrary

Ellis Morgan

unread,

Oct 28, 2013, 1:12:45 PM10/28/13

to

In article <0a1edb90-a99b-4b85...@googlegroups.com>, rusi
<rusto...@gmail.com> writes

>On Monday, October 28, 2013 6:34:06 PM UTC+5:30, rusi wrote:
>> I was browsing the idiom library (looks like quite a treasure btw!)
>>

>> and it seems that caret (^) is often used where it should be and ( 0 >>

>>
>>
>> eg idiom 6,7
>
>Not sure if links are being deleted??
>I meant this one
>aplwiki.com/FinnAplIdiomLibrary

Idiom 7 appears as X {and} {dot} {equals} {Grade-up} {Grade-up} X

returns 1 if X is (say) 4 3 2 1 so {and} is what should be there.

The symobol for {and} looks a bit like "^" if you don't see it like that it may
be a unicode problem, there is help elsewhere in the wiki to get this right.

--
Ellis Morgan

rusi

unread,

Oct 29, 2013, 1:48:28 PM10/29/13

to

On Monday, October 28, 2013 10:42:45 PM UTC+5:30, Ellis Morgan wrote:
> Idiom 7 appears as X {and} {dot} {equals} {Grade-up} {Grade-up} X
> returns 1 if X is (say) 4 3 2 1 so {and} is what should be there.
>
>
> The symobol for {and} looks a bit like "^" if you don't see it like that it may
> be a unicode problem, there is help elsewhere in the wiki to get this right.

Yes it could be a unicode problem.
However its strange that I see most of the other APL characters right except 'and'!

rusi

unread,

Oct 29, 2013, 11:30:06 PM10/29/13

to

On Monday, October 28, 2013 10:42:45 PM UTC+5:30, Ellis Morgan wrote:

> The symbol for {and} looks a bit like "^" if you don't see it like that it may

> be a unicode problem, there is help elsewhere in the wiki to get this right.

I went to http://aplwiki.com/AplCharacters (Is this the 'elsewhere'?)

Found the section Contained in the default ⎕AV
(sub link http://aplwiki.com/AplCharacters#Contained_in_the_default_.2BI5U-AV )

10th line (along with the other standard APL math symbols) there is ∧

Second-last line there is ^

They look sufficiently different and cut-pasting them into emacs tells me that the first is "LOGICAL AND" charset unicode, codepoint 0x2227
The second is "CIRCUMFLEX ACCENT" Old name "Spacing Circumflex" charset Ascii codepoint 0x5E

Go back to the idiom set http://aplwiki.com/FinnAplIdiomLibrary#Grade_Up_.2BI0s-
Cut-paste from 6th idiom shows a circumflex accent (not "AND")

phil chastney

unread,

Oct 30, 2013, 5:34:23 AM10/30/13

to

according to my browser, that page is "encoded" in UTF-8

that being so, the idiom should use codepoint U+2227 (rather than the
circumflwx accent), which would then display properly

this is probably a minor editing error on that page

/phil

rusi

unread,

Oct 30, 2013, 6:55:15 AM10/30/13

to

On Wednesday, October 30, 2013 3:04:23 PM UTC+5:30, phil chastney wrote:
> according to my browser, that page is "encoded" in UTF-8
>
> that being so, the idiom should use codepoint U+2227 (rather than the
> circumflwx accent), which would then display properly
>
> this is probably a minor editing error on that page

I think there are at least a dozen other uses of ^ on that page that should be ∧

IOW its seems to be a minor search-replace error of some sort

Ellis Morgan

unread,

Nov 2, 2013, 3:07:25 AM11/2/13

to

In article <21d5ae42-62cd-4b06...@googlegroups.com>, rusi
<rusto...@gmail.com> writes

>On Wednesday, October 30, 2013 3:04:23 PM UTC+5:30, phil chastney wrote:
>> according to my browser, that page is "encoded" in UTF-8
>>
>> that being so, the idiom should use codepoint U+2227 (rather than the
>> circumflwx accent), which would then display properly
>>
>> this is probably a minor editing error on that page
>
>I think there are at least a dozen other uses of ^ on that page that

>should be 0 >

>IOW its seems to be a minor search-replace error of some sort

From my session log ...

Dyalog APL/W Version 12.1.1
Serial No : 500486
Unicode Edition
Sat Nov 02 06:38:14 2013
clear ws

X^.=??X ? 4 3 2 1 ? code to left of arrow (it shows as one of the"?"s)
was pasted in from APLWIKI
1

At http://aplwiki.com/ConfiguringMailAndNewsReaders#APL_code_in_a_workspace it
says "Dyalog 12 is, so far as we know, fully unicode compliant."

So I am protected by dyalog from this example of lazy (?) encoding so copy and
paste from APLWIKI into APL workspace works for me.

There is much more on this WIKI page about how to handle it but I expect you
have already seen it.

My browser is Chrome and my newsreader is Turnpike (not unicode compliant !)
hence the question marks in the APL idiom above. To view and send APL chars at
CLA I use Google Groups - which is not so friendly as Turnpike so I usually
manage without them.

--
Ellis Morgan

rusi

unread,

Nov 2, 2013, 5:05:49 AM11/2/13

to

On Saturday, November 2, 2013 12:37:25 PM UTC+5:30, Ellis Morgan wrote:
> In article rusi writes

> >On Wednesday, October 30, 2013 3:04:23 PM UTC+5:30, phil chastney wrote:
> >> according to my browser, that page is "encoded" in UTF-8
> >> that being so, the idiom should use codepoint U+2227 (rather than the
> >> circumflwx accent), which would then display properly
> >> this is probably a minor editing error on that page
> >I think there are at least a dozen other uses of ^ on that page that

> >IOW its seems to be a minor search-replace error of some sort

Not sure what is your point

- my system is not working?
- your system is not working?
- some parts of APL are not unicode compliant?
- something else?

In particular:

> From my session log ...

> Dyalog APL/W Version 12.1.1
> Serial No : 500486
> Unicode Edition
> Sat Nov 02 06:38:14 2013
> clear ws

> X^.=??X ? 4 3 2 1 ? code to left of arrow (it shows as one of the"?"s)

I am seeing that as
- an X
- a caret (circumflex)
- dot
- an equal
- followed by '?'s and some other stuff interleaved

I am reading this in google-groups and checking the characters in emacs

My conjecture:
Dyalog is equivalencing ^ and ∧ (and)

Can you try that?
1. Copy-paste that from the wiki
2. Retype it using the proper ∧ (I guess next to the 0 key)

el...@ellismorgan.co.uk

unread,

Nov 2, 2013, 2:03:44 PM11/2/13

to

rusi

I hoped you would infer my points rather than having to spell it all out.

I am sending this reply using google groups so the APL should travel to you OK. What you see below has been pasted from the finnAPL APLWIKI page into my note.
6.
Test if X and Y are permutations of each other
X←D1; Y←D1
Y[⍋Y]^.=X[⍋X]
7.
Test if X is a permutation vector
X←I1
X^.=⍋⍋X
8.
Grade up (⍋) for sorting subvectors of Y having lengths X
Y←D1; X←I1; (⍴Y) ←→ +/X
A[⍋(+\(⍳⍴Y)∊+\⎕IO,X)[A←⍋Y]]

What you see next is what I see in my APL session when I copy and paste idiom 7 into to it
)clear
clear ws
⍝ X^.=⍋⍋X
X^.=⍋⍋X←4 3 2 1
1

As you see what I think you want to do is something I can do here using the WIKI as it is today. My points are something like this:

> Not all APL interpreters are the same I use Dyalog and it knows how to let me paste from the WIKI into my session.

> Both the WIKI and this newsgroup are self/mutual help sources. People have spent a fair amount of time putting into the WIKI info about how to paste into many different interpreters. Is yours included? If not perhaps you can change the WIKI so it does when you have found out how to do it. Generally if the WIKI works for me I don't try to change it.

> I know enough to run APL on my computer why things work is beyond me. Phil Chastney has suggested there mat be a problem with the encoding on the idiom page. As it does not seem to stop it working for me I feel no incentive to spend time changing anything.

> I got the ability to copy/paste from WIKI to APL many years ago by following the mysterious but simple instructions on the WIKI. It worked and I can't remember what I did.

> I have no idea what your problems are, or what causes them. If you need more help than you can get from the WIKI toy probably need to state thing like what interpreter, which operating system, which browser, and if you know what you are doing ... then someone who does know what to do may offer to help.

>

Ellis Morgan

unread,

Nov 2, 2013, 2:21:24 PM11/2/13

to

In article <879315bf-d41a-4ba8...@googlegroups.com>, rusi
<rusto...@gmail.com> writes

>Dyalog is equivalencing ^ and 0 >

>Can you try that?
>1. Copy-paste that from the wiki

>2. Retype it using the proper 0
rusi

If you look at these you can see snapshots of the WIKI and my APL session as I
pasted idiom 7 across

http://www.mrtlfrm.demon.co.uk/public/wiki1102.jpg
http://www.mrtlfrm.demon.co.uk/public/apl1102.jpg

--
Ellis Morgan

Lobachevsky

unread,

Nov 5, 2013, 1:17:23 AM11/5/13

to

What you see is often not what you get. Although some characters may look similar, and worse yet, look as if they should work, they do not. A case in point from years ago is the IBM 3278 where the ! glyph on the APL keyboard was not the same as the ! glyph on the standard, non-APL one. Today, with Unicode and other translation schemes, and what appears to be lots of duplicates, expect problems. Don't trust anything.

rusi

unread,

Nov 6, 2013, 8:18:16 AM11/6/13

to

Thanks Lobachevski for writing in. Here however the problem is the
complement -- different looking characters are being made equivalent.

-------------------------

On Saturday, November 2, 2013 11:33:44 PM UTC+5:30, Ellis Morgan wrote:
> I have no idea what your problems are, or what causes them. If you
> need more help than you can get from the WIKI toy probably need to
> state thing like what interpreter, which operating system, which
> browser, and if you know what you are doing ... then someone who
> does know what to do may offer to help.

1. Gnu APL
2. Debian (testing) Linux
3. Firefox 25
4. I guess not :-)

> I know enough to run APL on my computer why things work is beyond
> me. Phil Chastney has suggested there mat be a problem with the
> encoding on the idiom page. As it does not seem to stop it working
> for me I feel no incentive to spend time changing anything.

Phil Chastney said:

> according to my browser, that page is "encoded" in UTF-8
>
> that being so, the idiom should use codepoint U+2227 (rather than the
> circumflwx accent), which would then display properly
>
> this is probably a minor editing error on that page

If the encoding as UTF-8 is the error then what should it be??
I believe he is saying that there is an editing error.
Whether that is the case I can hardly comment since I am hardly an apl expert
Basically Dyalog-apl and Gnu-apl dont seem to agree.
Here a session with Gnu-apl

0 ∧ 1
0
0 ^ 1
SYNTAX ERROR

From what you are saying the second one works for you.
Should this be so? I am not qualified to comment

Bob Smith

unread,

Nov 6, 2013, 9:37:50 AM11/6/13

to

On 11/6/2013 8:18 AM, rusi wrote:
[...]

> Phil Chastney said:
>
>> according to my browser, that page is "encoded" in UTF-8
>>
>> that being so, the idiom should use codepoint U+2227 (rather than the
>> circumflwx accent), which would then display properly
>>
>> this is probably a minor editing error on that page
>
> If the encoding as UTF-8 is the error then what should it be??
> I believe he is saying that there is an editing error.
> Whether that is the case I can hardly comment since I am hardly an apl expert
> Basically Dyalog-apl and Gnu-apl dont seem to agree.
> Here a session with Gnu-apl
>
> 0 ∧ 1
> 0
> 0 ^ 1
> SYNTAX ERROR
>
> From what you are saying the second one works for you.
> Should this be so? I am not qualified to comment

There are several instances of reasonable symbols in Unicode from which
to choose the corresponding APL glyph:

Alpha: ⍺ U+237A or α U+03B1
Omega: ⍵ U+2375 or ω U+03C9
Stile: ∣ U+2223 or | U+007C
Tilde: ∼ U+223C or ~ U+007E
Or: ∧ U+2227 or ^ U+005E
Nor: ⍱ U+2371 or ⊽ U+22BD
Nand: ⍲ U+2372 or ⊼ U+22BC
Diamond: ⋄ U+22C4 or ◊ U+25CA
Quad: ⎕ U+2395 or ▯ U+25AF

to name but a few. The vendor's choice made then becomes what is
displayed on *output*. Whether a particular implementation chooses to
accept on *input* any of the alternate glyphs as an alias varies from
vendor to vendor. It's entirely possible that the author of the page in
question copied the symbol from a more permissive implementation and
your system is less permissive. I don't see this as an encoding issue
or even an editing error, but instead a vendor to vendor implementation
choice.

--
_________________________________________
Bob Smith -- bsm...@sudleydeplacespam.com
http://www.sudleyplace.com - http://www.nars2000.org
To reply to me directly, delete "despam".

rusi

unread,

Nov 6, 2013, 1:02:34 PM11/6/13

to

On Wednesday, November 6, 2013 8:07:50 PM UTC+5:30, Bob Smith wrote:

> I don't see this as an encoding issue
> or even an editing error, but instead a vendor to vendor implementation
> choice.

Yes that is my (ignoramus-layman) perception also

> There are several instances of reasonable symbols in Unicode from which
> to choose the corresponding APL glyph:

> Alpha: ⍺ U+237A or α U+03B1
> Omega: ⍵ U+2375 or ω U+03C9
> Stile: ∣ U+2223 or | U+007C
> Tilde: ∼ U+223C or ~ U+007E
> Or: ∧ U+2227 or ^ U+005E
> Nor: ⍱ U+2371 or ⊽ U+22BD
> Nand: ⍲ U+2372 or ⊼ U+22BC
> Diamond: ⋄ U+22C4 or ◊ U+25CA
> Quad: ⎕ U+2395 or ▯ U+25AF

> to name but a few. The vendor's choice made then becomes what is
> displayed on *output*.
> Whether a particular implementation chooses to
> accept on *input* any of the alternate glyphs as an alias varies from
> vendor to vendor. It's entirely possible that the author of the page in
> question copied the symbol from a more permissive implementation and
> your system is less permissive.

Well ok… except that you are using the word 'glyph' where I
would use 'character.' See http://www.glyphsapp.com/tutorials/unicode

Of course I freely admit that unicode can make a mess where there was
none:
With ASCII there is no argument that 'a' and 'A' differ
With unicode i ı are hard to distinguish 'Α' and 'A' are even worse

phil chastney

unread,

Nov 7, 2013, 5:23:16 PM11/7/13

to

On 2013/Nov/06 13:18, rusi wrote:
> On Tuesday, November 5, 2013 11:47:23 AM UTC+5:30, Lobachevsky wrote:
>

> <snip>

>
>> I know enough to run APL on my computer why things work is beyond
>> me. Phil Chastney has suggested there mat be a problem with the
>> encoding on the idiom page. As it does not seem to stop it working
>> for me I feel no incentive to spend time changing anything.
>
> Phil Chastney said:
>
>> according to my browser, that page is "encoded" in UTF-8
>>
>> that being so, the idiom should use codepoint U+2227 (rather than the
>> circumflwx accent), which would then display properly
>>
>> this is probably a minor editing error on that page
>
> If the encoding as UTF-8 is the error then what should it be??
> I believe he is saying that there is an editing error.
> Whether that is the case I can hardly comment since I am hardly an apl expert
> Basically Dyalog-apl and Gnu-apl dont seem to agree.
> Here a session with Gnu-apl
>

> 0 âˆ§ 1

> 0
> 0 ^ 1
> SYNTAX ERROR
>
> From what you are saying the second one works for you.
> Should this be so? I am not qualified to comment

there is nothing wrong with UTF-8, and there is nothing wrong with using
UTF-8 as an "encoding" -- UTF-8 is one of the great inventions of the
20th century

UTF-8 is actually a "transformation format", which maps Unicode values
in the range 0 to 2*20 or 2*21 or thereabouts, onto a succession of
8-bit values

this means a sequence of Unicode values can be mapped onto an existing
8-bit communication system, so that the original Unicode values can be
reconstructed at the receiving end

that being so, the next question is "what Unicode values should be used
to represent the APL character set?"

the big problem is that word "should" -- there is no standard

the nearest approach to a standard is still, I guess,
http://std.dkuug.dk/jtc1/sc22/open/n3067.pdf
and, personally, I would have expected APLWiki to work to that, but
they're under no obligation to do so

among the vendors, there is a broad consensus of how APL maps into
Unicode, but the consensus is by no means complete

one thing they all seem to agree on, however, is that input should be
tolerant, so that pasting A^B (as code!) into a session, should be
interpreted as A∧B (which should read: A and B )

I believe copying from APLWiki and pasting into Dyalog works OK, but YMMV

The Unicode Standard itself used to have a brief entry on Compliance, to
the effect that any string intended for onward transmission, should be
passed on unchanged -- principally, this meant comms kit shouldn't go
round stripping nulls the way they used to

the word "compliance" no longer appears in the standard's index, but
there's a whole chapter on Conformance -- basically, you can do what
you like with a string of Unicode characters, but if your software
includes routines for BiDi, line-breaking, collation, etc, you have to
follow Unicode's specs if you want to claim your software "Conforms" --
clearly, custom collation routines will be installed all over the
place, but nobody claims they're conformant, so that's all OK

it's just possible that using a spacing version of an accent as
equivalent to a mathematical operator may infringe the standard's rules
on preserving Character Identity -- but who cares? none of this stuff
is enforceable anyway

in brief, if APLWiki chooses to use circumflex for "and", they are free
to do so -- maybe C&P from APLWiki into your favourite interpreter
will work, and maybe not

that may be inconvenient, it may be undesirable, but it's not "wrong"
-- there is no _requirement_ for suppliers to accept incoming APLWiki code

well, that's the way I see it . . . /phil

Bob Smith

unread,

Nov 7, 2013, 7:19:03 PM11/7/13

to

On 11/7/2013 5:23 PM, phil chastney wrote:
[...]

> the nearest approach to a standard is still, I guess,
> http://std.dkuug.dk/jtc1/sc22/open/n3067.pdf

Many thanks for the above reference. I suspect that, among other
things, the "approved" symbol for the minus sign (U+2212) comes as a
surprise to many APL vendors, myself included.

rusi

unread,

Nov 7, 2013, 10:25:41 PM11/7/13

to

On the whole Phil, thanks for your thoughts.
Some details below.

On Friday, November 8, 2013 3:53:16 AM UTC+5:30, phil chastney wrote:
> On 2013/Nov/06 13:18, rusi wrote:
> > On Tuesday, November 5, 2013 11:47:23 AM UTC+5:30, Lobachevsky wrote:
> >> I know enough to run APL on my computer why things work is beyond
> >> me. Phil Chastney has suggested there mat be a problem with the
> >> encoding on the idiom page. As it does not seem to stop it working
> >> for me I feel no incentive to spend time changing anything.
> > Phil Chastney said:
> >> according to my browser, that page is "encoded" in UTF-8
> >> that being so, the idiom should use codepoint U+2227 (rather than the
> >> circumflwx accent), which would then display properly
> >> this is probably a minor editing error on that page
> > If the encoding as UTF-8 is the error then what should it be??
> > I believe he is saying that there is an editing error.
> > Whether that is the case I can hardly comment since I am hardly an apl expert
> > Basically Dyalog-apl and Gnu-apl dont seem to agree.
> > Here a session with Gnu-apl

> > 0 â§ 1

Something toward happened in my sending this line to your system and
yours returning it. Am I splitting hairs?? See below...

That 'something' did not happen here!

> I believe copying from APLWiki and pasting into Dyalog works OK, but YMMV

> The Unicode Standard itself used to have a brief entry on Compliance, to
> the effect that any string intended for onward transmission, should be
> passed on unchanged -- principally, this meant comms kit shouldn't go
> round stripping nulls the way they used to

> the word "compliance" no longer appears in the standard's index, but
> there's a whole chapter on Conformance -- basically, you can do what
> you like with a string of Unicode characters, but if your software
> includes routines for BiDi, line-breaking, collation, etc, you have to
> follow Unicode's specs if you want to claim your software "Conforms" --
> clearly, custom collation routines will be installed all over the
> place, but nobody claims they're conformant, so that's all OK

> it's just possible that using a spacing version of an accent as
> equivalent to a mathematical operator may infringe the standard's rules
> on preserving Character Identity -- but who cares? none of this stuff
> is enforceable anyway

> in brief, if APLWiki chooses to use circumflex for "and", they are free
> to do so -- maybe C&P from APLWiki into your favourite interpreter
> will work, and maybe not

> that may be inconvenient, it may be undesirable, but it's not "wrong"
> -- there is no _requirement_ for suppliers to accept incoming APLWiki code

> well, that's the way I see it . . . /phil

APL was one of the most fun languages Ive ever used. It could never
become popular because something like unicode did not exist in
Iverson's time.

Today it does. And so reducing gratuitous incompatibilities will go
a long way to spreading APL

As for the wiki -- I think its a great resource. I come to it as a
newbie certainly not to find fault. Just consider that dozens -- maybe
hundreds -- of others may also see it: "APL one of those zany
languages -- unfortunately does not work on modern machines" and move
on.

I would offer to help set it right if I had a little more of a clue!!

phil chastney

unread,

Nov 8, 2013, 5:51:53 AM11/8/13

to

right -- well, I made sure my reply used UTF-8, and your reply
followed suit, so everything is OK in that respect

I highlighted the splodge in the second "A splodge B", copied and pasted
it into Word, did a quick Alt-X on the pasted character and got the
result 2227 -- bingo! everything looks OK as regards transmission: the
hexadecimal quantity 2227 has made it through God knows how many bits of
comms kit, and returned to base unharmed (it's a bit like the excitement
pigeon racers must feel when the little creature makes it back home)

so the most likely suspect is the display your end, and the most likely
part of the display is the font

do you have access to something like Charmap? my guess is that the font
you are using to display incoming emails does not have a glyph for the
codepoint 2227

that's the easy one taken care of

I see the AND sign in your earlier msg, and the mangled version in my
reply -- after mangling, the two characters are (hex) 00E2 and 00A7

check out http://www.fileformat.info/info/unicode/char/2227/index.htm
and you will see that the UTF-8 representation of U+2227 is 0xE2 0x88 0xA7

now 0x88 is (supposed to be) a control code, and therefore has no
visible representation, so "â§" is a three character sequence whose
middle character is invisible, which is the 8-bit display of the UTF-8
sequence which once represented U+2227

this is almost certainly my fault -- while drafting that earlier
reply, I was swapping between encodings, to see what happened, and one
of the encodings I swapped in and out of was Windows 1252

Windows 1252 is a bit of a minefield, principally because it has visible
characters for the range U+0080 to U+009F -- the round trip UTF-8 ->
Windows 1252 -> UTF-8 should leave the text unaltered

however, the round trip UTF-8 -> Windows 1252 -> something else -> UTF-8
could quite easily have converted 0x88 into some other value, so that
the 3 character sequence was no longer a valid UTF-8 sequence (or
something along those lines) so the UTF encoder could no longer recover
the original value of 0x2227

for this, I apologise -- one thing I have learnt from this exercise is
to be more careful when swapping between encodings

I hope this makes some sort of sense to you -- similar problems arise
when dealing with almost any text requiring characters outside the
Latin-1 page -- it's an interesting subject area, only so long as you
don't need correctly rendered text in a hurry, when it can be truly
maddening

all the best . . . /phil

phil chastney

unread,

Nov 8, 2013, 6:31:14 AM11/8/13

to

On 2013/Nov/08 00:19, Bob Smith wrote:
> On 11/7/2013 5:23 PM, phil chastney wrote:
> [...]
>> the nearest approach to a standard is still, I guess,
>> http://std.dkuug.dk/jtc1/sc22/open/n3067.pdf
>
> Many thanks for the above reference. I suspect that, among other things,
> the "approved" symbol for the minus sign (U+2212) comes as a surprise to
> many APL vendors, myself included.

the symbol at the top right of your average keyboard fulfills various
functions: hyphen, minus, and dash, among others

in wood and metal type, and photographic typesetting, the glyphs for
these characters differ in length, width of stroke and height above the
baseline

more importantly, you can break the line at a hyphen, but not (ideally)
at a minus sign

Unicode's solution was to rename the ASCII symbol as hyphen-minus
(thereby emphasising its ambiguity), and introduce separate characters
for hyphen, minus and a shedload of dashes, with different properties

problem solved

you can break a line at a period/full stop, but not (ideally) at a
decimal point

but what is a decimal point? and where is it placed, relative to the
baseline? what about paragraph numbers like 1.2.3? what about the use of
the dot to denote multiplication? what about dots for ellipsis?
what about national differences?

as I see it, Unicode (the organisation) has decided this problem is
unsolvable, so there are lots of dots defined, in various
configurations, but no visibly similar dots with differing properties

so, don't trust a machine to insert linebreaks for you, without checking
how well it's been done

it can be frustrating, but I have never, ever wished to go back to the
days when I handed a handwritten script to a typist, for final output

regards . . . /phil

rusi

unread,

Nov 8, 2013, 12:38:28 PM11/8/13

to

> Windows 1252 -> UTF-8 should leave the text unaltered

> however, the round trip UTF-8 -> Windows 1252 -> something else -> UTF-8
> could quite easily have converted 0x88 into some other value, so that
> the 3 character sequence was no longer a valid UTF-8 sequence (or
> something along those lines) so the UTF encoder could no longer recover
> the original value of 0x2227

> for this, I apologise -- one thing I have learnt from this exercise is
> to be more careful when swapping between encodings

> I hope this makes some sort of sense to you -- similar problems arise
> when dealing with almost any text requiring characters outside the
> Latin-1 page -- it's an interesting subject area, only so long as you
> don't need correctly rendered text in a hurry, when it can be truly
> maddening

Thanks for taking the trouble to write that up!
Cant say I understood it all -- especially the round-tripping issues with windows 1252

On the other hand the 2227 ∧ is showing alright except at that one 'splodge'
So I conjecture that my setup is ok (at least for this!!)

James J. Weinkam

unread,

Nov 8, 2013, 11:05:06 PM11/8/13

to

phil chastney wrote:
>
> you can break a line at a period/full stop, but not (ideally) at a decimal point
>

Actually, lines can be broken at a blank or space. Some fonts have non breaking spaces of various widths for use in
setting equations and other matter that must not be broken.

phil chastney

unread,

Nov 9, 2013, 4:40:16 AM11/9/13

to

that's true -- lines can (sometimes) be broken at spaces (and at lots
of other places as well)

Unicode's Line Breaking algorithm is quite complicated, but I have never
needed it, so I don't know the detail

there are lots of other spaces, U+2000 and onwards, for a start

the only non-breaking spaces I know of are U+00A0, U+202f and (arguably)
U+FEFF, but the official posn w.r.t that last one varies with time

J. Clarke

unread,

Dec 1, 2013, 1:25:49 PM12/1/13

to

In article <fc20fd09-9c99-4d0d...@googlegroups.com>,
rusto...@gmail.com says...

>
> On Friday, November 8, 2013 4:21:53 PM UTC+5:30, phil chastney wrote:
> > On 2013/Nov/08 03:25, rusi wrote:
> > > On the whole Phil, thanks for your thoughts.
> > > Some details below.
> > > On Friday, November 8, 2013 3:53:16 AM UTC+5:30, phil chastney wrote:
> > >> On 2013/Nov/06 13:18, rusi wrote:
> > >>> On Tuesday, November 5, 2013 11:47:23 AM UTC+5:30, Lobachevsky wrote:
> > >>>> I know enough to run APL on my computer why things work is beyond
> > >>>> me. Phil Chastney has suggested there mat be a problem with the
> > >>>> encoding on the idiom page. As it does not seem to stop it working
> > >>>> for me I feel no incentive to spend time changing anything.
> > >>> Phil Chastney said:
> > >>>> according to my browser, that page is "encoded" in UTF-8
> > >>>> that being so, the idiom should use codepoint U+2227 (rather than the
> > >>>> circumflwx accent), which would then display properly
> > >>>> this is probably a minor editing error on that page
> > >>> If the encoding as UTF-8 is the error then what should it be??
> > >>> I believe he is saying that there is an editing error.
> > >>> Whether that is the case I can hardly comment since I am hardly an apl expert
> > >>> Basically Dyalog-apl and Gnu-apl dont seem to agree.
> > >>> Here a session with Gnu-apl

> > >>> 0 ï¿œ 1

> > >> interpreted as A?B (which should read: A and B )

> > > That 'something' did not happen here!
>
> > right -- well, I made sure my reply used UTF-8, and your reply
> > followed suit, so everything is OK in that respect
>
> > I highlighted the splodge in the second "A splodge B", copied and pasted
> > it into Word, did a quick Alt-X on the pasted character and got the
> > result 2227 -- bingo! everything looks OK as regards transmission: the
> > hexadecimal quantity 2227 has made it through God knows how many bits of
> > comms kit, and returned to base unharmed (it's a bit like the excitement
> > pigeon racers must feel when the little creature makes it back home)
>
> > so the most likely suspect is the display your end, and the most likely
> > part of the display is the font
>
> > do you have access to something like Charmap? my guess is that the font
> > you are using to display incoming emails does not have a glyph for the
> > codepoint 2227
>
> > that's the easy one taken care of
>
> > I see the AND sign in your earlier msg, and the mangled version in my
> > reply -- after mangling, the two characters are (hex) 00E2 and 00A7
>
> > check out http://www.fileformat.info/info/unicode/char/2227/index.htm
> > and you will see that the UTF-8 representation of U+2227 is 0xE2 0x88 0xA7
>
> > now 0x88 is (supposed to be) a control code, and therefore has no

> > visible representation, so "ï¿œ" is a three character sequence whose

> > middle character is invisible, which is the 8-bit display of the UTF-8
> > sequence which once represented U+2227
>
> > this is almost certainly my fault -- while drafting that earlier
> > reply, I was swapping between encodings, to see what happened, and one
> > of the encodings I swapped in and out of was Windows 1252
>
> > Windows 1252 is a bit of a minefield, principally because it has visible
> > Windows 1252 -> UTF-8 should leave the text unaltered
>
> > however, the round trip UTF-8 -> Windows 1252 -> something else -> UTF-8
> > could quite easily have converted 0x88 into some other value, so that
> > the 3 character sequence was no longer a valid UTF-8 sequence (or
> > something along those lines) so the UTF encoder could no longer recover
> > the original value of 0x2227
>
> > for this, I apologise -- one thing I have learnt from this exercise is
> > to be more careful when swapping between encodings
>
> > I hope this makes some sort of sense to you -- similar problems arise
> > when dealing with almost any text requiring characters outside the
> > Latin-1 page -- it's an interesting subject area, only so long as you
> > don't need correctly rendered text in a hurry, when it can be truly
> > maddening
>
> Thanks for taking the trouble to write that up!
> Cant say I understood it all -- especially the round-tripping issues with windows 1252
>

> On the other hand the 2227 ? is showing alright except at that one 'splodge'

> So I conjecture that my setup is ok (at least for this!!)

FWIW, I just tried some experiments.

Looking at the page I saw the caret. So I got curious--the machine I
normally use runs Vista. I checked with Chrome, IE, and Mozilla and all
showed the caret, so that rules out something browser-specific.

I have a Windows 8.1 machine sitting here, so I tried it and the "and"
displayed properly. My first reaction was "AHA!" but then I noted the
NARS2000 icon poking out of a corner of the screen and tried another
test. The Windows 8.1 machine did not have NARS2000, so I installed it.
After that, it also showed the caret rather than the "and".

So I installed the APL385.TTF font on it and the "and" was back. I then
installed APL385.TTF on the Vista machine and now it also displays the
"and" properly in all three browsers.

My conclusion is that this is an issue with a specific font or family of
fonts.

rusi

unread,

Dec 2, 2013, 7:10:10 AM12/2/13

to

On Sunday, December 1, 2013 11:55:49 PM UTC+5:30, J. Clarke wrote:

>
> FWIW, I just tried some experiments.

Just giving some data points for your experiments. Dunno if they have any
value

Heres my line that you quoted

> >
> > On the other hand the 2227 ? is showing alright except at that one 'splodge'

And here the original (in google groups)
----------
On the other hand the 2227 ∧ is showing alright except at that one 'splodge'

----------

ie the character after the 2227 vanished and became a ? in your quote.
Whether you see it at all is another question?!

J. Clarke

unread,

Dec 3, 2013, 10:06:57 AM12/3/13

to

In article <c0d2445d-026b-4cff...@googlegroups.com>,
rusto...@gmail.com says...

>
> On Sunday, December 1, 2013 11:55:49 PM UTC+5:30, J. Clarke wrote:
>
> >
> > FWIW, I just tried some experiments.
>
> Just giving some data points for your experiments. Dunno if they have any
> value
>
> Heres my line that you quoted
>
> > >
> > > On the other hand the 2227 ? is showing alright except at that one 'splodge'
>
>
> And here the original (in google groups)
> ----------
> On the other hand the 2227 ? is showing alright except at that one 'splodge'
>

> ----------
>
> ie the character after the 2227 vanished and became a ? in your quote.
> Whether you see it at all is another question?!

That's my newsreader at work. I use Gravity which is an old-school
straight-ASCII newsreader. Any character that it doesn't recognize it
replaces with a question mark. In Outlook Express, which supports HTML
and Unicode, the "and" in your first post displays properly.

phil chastney

unread,

Dec 3, 2013, 2:39:45 PM12/3/13

to

On 2013/Dec/01 18:25, J. Clarke wrote:

> <snipped lots of stuff>

>
> FWIW, I just tried some experiments.
>
> Looking at the page I saw the caret. So I got curious--the machine I
> normally use runs Vista. I checked with Chrome, IE, and Mozilla and all
> showed the caret, so that rules out something browser-specific.
>
> I have a Windows 8.1 machine sitting here, so I tried it and the "and"
> displayed properly. My first reaction was "AHA!" but then I noted the
> NARS2000 icon poking out of a corner of the screen and tried another
> test. The Windows 8.1 machine did not have NARS2000, so I installed it.
> After that, it also showed the caret rather than the "and".
>
> So I installed the APL385.TTF font on it and the "and" was back. I then
> installed APL385.TTF on the Vista machine and now it also displays the
> "and" properly in all three browsers.
>
> My conclusion is that this is an issue with a specific font or family of
> fonts.

an interesting conclusion, but before accepting it as an explanation, we
need to know if the environments you're using recognise Unicode

we then need to know
--- what code was sent, to represent the character in question
--- what font was specified for the display of that character
--- does the font have a glyph defined for that character
--- if not, what does the environment do about the missing glyph
(basically, are there any fallback fonts specified)

this isn't always as easy as it sounds -- Unicode may or may not be
recognised at different levels of the software environment -- fallback
fonts may or may not apply at different levels -- if an encoding is
not specified, some software will assume a default encoding, while some
software will attempt to identify the encoding from the content

there are now 3 characters in this discussion:
U+005e circumflex accent
U+2227 logical and
U+2038 caret

my guess is that the reference to the caret is a red herring, so I'll
ignore it

what do you mean when you say 'it ... displays the "and" properly'?

if, perchance, you mean that the visual display coincides pretty closely
with what you'd expect to see, we need to take into account that, as far
as I can see, the glyphs in APL385 for circumflex and logical AND are
identical, except that the circumflex is vertically positioned at the
caps height, and the logical AND is at mid height (roughly on the math axis)

are you sure you're seeing the glyph for the logical AND?

indeed, are you sure you should be seeing logical AND? maybe the APL
vendor is using ASCII codepoints for non-ASCII characters?

also, are you sure what encoding is in force? the msg I'm replying to is
recognised by Thunderbird as ISO-8859-15, while your later reply to Rusi
is recognised as ISO-8859-1 -- you may have similar variations in your
other environments

in short, I would be reluctant to blame a font, without first
establishing precisely what the font has been asked to display

it may still be a font problem, of course -- what do I know?

J. Clarke

unread,

Dec 4, 2013, 6:34:49 PM12/4/13

to

In article <529E3381...@yahoo.com>, philip_...@yahoo.com
says...

Looking at the web page in a hex editor, I find that the character is
hex 5E, "circumflex accent". Looking at the SimPL Medium font included
with APL2000 I find that it shows character 5E as being a small
circumflex at a height that would put it above the regular letters.
Looking at the APL385 font, it shows the typical APL "and" mark, much of
a muchness with the one that the APL element on a 2741 interactive
terminal would type.

Comparing closely I see that the character displayed is slightly
elevated but not to an extent that stands out unless it is placed next
to 2227.

Examining the CSS I find that the page provides APL385 as the embedded
font--that should be used unless there is a local font that is identifed
as an APL font. The "SimPL medium" provided witn NARS2000 would fill
the bill and rationally would yield the observed behavior.

So one can argue that it is an editing error and the page _should_ be
using 2227 instead of 005E, however since the person editing the page is
using APL385, which does not provide a sharp distinction between the
two, it is understandable that this would go unnoticed.

As for the encoding for USENET posts, that is set explicitly in my
newsreader and is unrelated to anything else in Windows as far as I
know.

The Windows 8.1 installation is fresh out of the box on a bare machine
and is running on the system defaults.

I also tried cutting and pasting the expression into NARS2000 and it
provides the expected result as pasted.

I do not have another APL readily available at this time so cannot test
further.

J. Clarke

unread,

Dec 4, 2013, 8:47:00 PM12/4/13

to

In article <MPG.2d097d5fb...@news.newsguy.com>,
jclark...@cox.net says...

Also, it is my understanding that the page in question was ported from a
Finnish site. I have no idea what the character mappings on a Finnish
APL keyboard would be.

phil chastney

unread,

Dec 6, 2013, 6:34:36 AM12/6/13

to

a very thorough investigation

certainly, I would argue that the page should be using U+2227 -- just
as I would argue that all APL implementations should conform with N3067,
regardless of its "official" status

but they don't, and that isn't going to change

the Industrial Revolution started in England, and English mills and
factories were the first to be equipped with the new machinery -- when
better machinery was invented, it was difficult for English mill- and
factory-owners to cost-justify replacing the existing machinery, so they
slowly lost ground to those who kitted up more recently

I believe that a similar situation obtains w.r.t APL -- the fact that
users of an APL site need a specifically APL font for the code to look
right(ish) raises the bar to acceptance (needlessly, IMHO) -- the fact
that this specifically APL font can no longer display ASCII text strings
which look "right" is another undesirable side-effect

> As for the encoding for USENET posts, that is set explicitly in my
> newsreader and is unrelated to anything else in Windows as far as I
> know.

I use Thunderbird, and it allows me to specify an encoding for new
posts, the option of replying in the sender's encoding or applying my
default encoding, or specifically apply any encoding I wish to
individual incoming and outgoing messages -- it's all very fluid

the recipient is still free to ignore my chosen encoding, though

> The Windows 8.1 installation is fresh out of the box on a bare machine
> and is running on the system defaults.
>
> I also tried cutting and pasting the expression into NARS2000 and it
> provides the expected result as pasted.
>
> I do not have another APL readily available at this time so cannot test
> further.

while your process tree consists of Windows and NARS only, things are
fairly straightforward -- the real nightmare is HTML: you can specify
default encodings to the browser, only to find that incoming documents
have a different default encoding set, and worse, it's incorrectly specified

it's estimated that over 50% of websites use Unicode nowadays, but
almost every day, I encounter pages which display with question marks
where I would expect diacritics or other common punctuation marks

users of APLWiki will be using an HTML interpreter of some sort -- we
need to bear in mind that the HTML page will see the APL symbols as
text, but what about the case which triggered this thread, where the
user wants to copy from text and paste it as code?

the best solution would be for all suppliers to agree on tolerant input
and strict output, to agree what that strict output should be, and also
to agree what is, and is not, acceptable tolerant input -- and then
implement that agreement uniformly

how they would enforce such an agreement I have no idea

the situation at the moment reminds me of CP/M files -- mostly pretty
similar, but with enough differences of detail to really spoil your day

Unix, too

phil chastney

unread,

Dec 6, 2013, 6:40:14 AM12/6/13

to

On 2013/Dec/05 01:47, J. Clarke wrote:
> In article<MPG.2d097d5fb...@news.newsguy.com>,
> jclark...@cox.net says...

> <snipped lost more stuff>

>
> Also, it is my understanding that the page in question was ported from a
> Finnish site. I have no idea what the character mappings on a Finnish
> APL keyboard would be.

I don't know about Finnish APL keyboards, but I was always grateful to
the Finns for providing such a strong argument in favour of switching
from 8-bit encodings:
"if you're coding in APL and writing your comments in Finnish,
what 8-bit coding would you use?"

/phil

rusi

unread,

Dec 7, 2013, 11:10:56 PM12/7/13

to

On Friday, December 6, 2013 5:04:36 PM UTC+5:30, phil chastney wrote:

> the best solution would be for all suppliers to agree on tolerant input
> and strict output, to agree what that strict output should be, and also
> to agree what is, and is not, acceptable tolerant input -- and then
> implement that agreement uniformly

Many industry-strength compilers -- I can think of gcc and ghc
(haskell) -- have a (set of) command-line options to strictify the
compiler. Eg. for gcc one can say --std=c89 to specify the c89
standard compliance. Having such command-line option(s) allows a user
to choose a level of strictness that is appropriate.

J. Clarke

unread,

Dec 8, 2013, 11:37:37 AM12/8/13

to

In article <caa20ed3-adc6-40d4...@googlegroups.com>,
rusto...@gmail.com says...

The thing is though, APL is not normally compiled, so there isn't any
"output" in the sense that gcc produces output. You can save your code
as a text file, but what encoding is used there depends on whatever text
editor you are using (some implementations include a GUI editor, others
just have the del editor). Particular implementations may have a
facility to dump program code to a text file or it may be necessary for
the user to write such a utility himself--ordinarily program code is
just saved in one's workspace and there is no generally accepted
standard workspace format of which I am aware.

rusi

unread,

Dec 8, 2013, 11:48:49 AM12/8/13

to

On Sunday, December 8, 2013 10:07:37 PM UTC+5:30, J. Clarke wrote:
> Rusi says...

Dunno if the 'compiled' is relevant -- even interpreted languages can
have behavior modified with command-line flags.

I guess though, that what is relevant is that APL is not typically called
with command-line parameters like most Unix programs.

J. Clarke

unread,

Dec 9, 2013, 11:09:22 AM12/9/13

to

In article <52b54297-2a50-4db2...@googlegroups.com>,
rusto...@gmail.com says...

They can, but what the output is generally what the programmer wants
them to output--if it's not what he wanted then he needs to work on his
program until it is.

> I guess though, that what is relevant is that APL is not typically called
> with command-line parameters like most Unix programs.

Historically the APL system didn't get called by a user, it got started
at IPL and then was accessed by users, and the admins had source code so
any changes needed got made at that level rather than by switches.

If you want to see how an early version worked, there is an
implementation of APL\360 for the Hercules emulator--
http://hercules390.996247.n3.nabble.com/Running-APL-360-on-OS-360-MVT-
21-8F-Large-Configuration-td40063.html. There's a link there to a
download that has instructions and tells you everything else you'll
need.