Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Bringing Unicode to Prolog (Dogelog Runtime)

Skip to first unread message

Mostowski Collapse

Jul 10, 2021, 6:56:20 PM7/10/21
Was checking out what Logtalk does. They
have Prolog text here:

These Prolog texts contain facts like:

unicode_category_(0x00AD, 'Cf').
unicode_category_(0x0600, 'Cf').
unicode_category_(0x0601, 'Cf').
unicode_category_(0x0602, 'Cf').
unicode_category_(0x0603, 'Cf').
unicode_category_(0x0604, 'Cf').
unicode_category_(0x06DD, 'Cf').
unicode_category_(0x070F, 'Cf').

This might be feasible standalone, but
if we want to deliver JavaScript files
over the wire into the browser,

its possibly not a good idea to have
facts with 1114111 entries. So we
made a little Unicode compressor:

compression took 109 ms

pool.size()*PARA_SIZE= 8384
pool2.size()*BLOCK_SIZE= 3072
Total= 12544

The compression ratio is 1%. The
1114111 entries boil down to 12544
entries. See also:

Preview: Unicode general categories Prolog API. (Jekejeke)

Mostowski Collapse

Jul 10, 2021, 7:28:13 PM7/10/21
The comressed Unicode data is in JavaScript arrays,
two two-dimensional arrays and one one-dimensional array.
This could be further Prologified:

pool(+Integer, +Integer, -Integer)
pool2(+Integer, +Integer, -Integer)
buf3(+Integer, +Integer)

But since Dogelog runtime has not yet first argument
indexing, there is no use of doing this. But code_type/2
could then be implemented as follows:

code_type(C, T) :-
I is C>>10, buf3(I, J),
K is (C>>4)/\ 0x3F, pool2(J, K, L),
M is C /\ 0xF, pool(L, M, T).

Markus Triska would use (#=)/2 instead (is)/2, and say
its bidrectional. It can be also made bidrectional
manually as follows, by prepending a further clause:

code_type(C, T) :- var(C), !,
pool(L, M, T),
pool2(J, K, L),
buf3(I, J),
C is (I<<10)+(K<<4)+M.

Disclaimer: Didn't try, just paper work.

Mostowski Collapse

Jul 12, 2021, 4:02:58 AM7/12/21
More progress in the Unicode domain, so that we can
leave Unicode behind and move to other things. Unicode
code points, that they can represent a numeric value.

Some exotic and not so exotic examples are seen here:

unicode_numerical_value(0x0D75, 0.75, 3/4). % No MALAYALAM FRACTION THREE QUARTERS
unicode_numerical_value(0x096C, 6.0, 6). % Nd DEVANAGARI DIGIT SIX
unicode_numerical_value(0x216E, 500.0, 500). % Nl ROMAN NUMERAL FIVE HUNDRED

We decided to replicate the Character.digit() API from Java. This API
only returns numeric values that are integer and that are between 0 and 35.
Otherwise the API returns -1.

Can we compress like before? Yes!

Without Numeric With Numeric
pool.size()*PARA_SIZE= 8384 8416
pool2.size()*BLOCK_SIZE= 3072 3072
MAX_BLOCK= 1088 1088
Total= 12544 12576

See also:

Preview: Unicode numeric values Prolog API. (Jekejeke)

Or if you are not on Twitter, but on Facebook:

Preview: Unicode numeric values Prolog API. (Jekejeke)

Mostowski Collapse

Jul 17, 2021, 4:57:28 AM7/17/21
Using the compressed unicode now for:
- detecting whether an atom needs quote
- detecting whether tokens alias during writing

We do it slightly different than in Jekejeke Prolog, without a word
break 2 dimensional matrice. The word break 2 dimensional matrice
seems to be not anymore needed, only a few cases are relevant,

which can be directly coded as Prolog predicates:

is_name is_name
is_symbol is_symbol
\' \'

After a journey into enhancing write_term/1, the compact
writing of operators now does not give anymore:

:- X = 1 rem 2, write(X), nl.
:- X = (1+2) rem 3, write(X), nl.
:- X = ** rem **, write(X), nl.

We inject spaces when there is the danger of token
aliasing, so that we now get:

1 rem 2
(1+2)rem 3

You can try yourself. You might need to clear your browser
cache, so that you catch the newest version:

Mostowski Collapse

Jul 17, 2021, 5:03:30 AM7/17/21
But then the journey was not over, we had also to
handle these cases, which are not handled through
token aliasing detecting, but through context flags:

:- X = (-) - -, write(X), nl.
:- X = - - -, write(X), nl.
:- X = - (a,b,c), write(X), nl.
:- X = -(a,b,c), write(X), nl.
:- X = - 1, write(X), nl.
:- X = -1, write(X), nl.
(-)- -
- - -
- (a, b, c)
-(a, b, c)
- 1

The relevant context flags now also implemented
by Dogelog Runtime directly in Prolog, formerly found
in Jekejeke Prolog implemented in Java:

- SPEZ_OPER: 1 for (-)- - versus - - -
- SPEZ_FUNC: 2 for - (a, b, c) versus -(a, b, c)
- SPEZ_MINS: 4 for - 1 versus -1

Mostowski Collapse

Jul 17, 2021, 5:09:44 AM7/17/21
Currently working on the final context flag that we
need in Dogelog Runtime. We need to port the SPEZ_LEFT
context flag from Jekejeke Prolog to the Dogelog Runtime.

Its for a test case that doesn't work correctly in SWI-Prolog:
Shouldn’t two structurally different Prolog terms
give a different output?

/* SWI-Prolog 8.3.26 */
?- op(400,xfy,***).

?- X=(a***b)*c, Y=a***(b*c).
X = a***b*c,
Y = a***b*c.

?- X=(a***b)*c, Y=a***(b*c), X=Y.

TauProlog does it correctly, but a little too cautious, it always
places parenthesis. GNU Prolog and Jekejeke Prolog are a little
less cautious, and place only parenthesis in one case:

/* GNU Prolog 1.4.5 (64 bits) */
?- X=(a***b)*c, Y=a***(b*c).
X = (a***b)*c
Y = a***b*c

We now need to reimplement for Dogelog Runtime:

if (arity == 2 && (oper = Pl_Lookup_Oper(functor, INFIX)))
if (oper->prec > prec || (context == INSIDE_LEFT_ASSOC_OP &&
(oper->prec == oper->right
&& oper->prec == prec)) )
{ /* prevent also the case: T xfy U yf(x) */
bracket = TRUE;

Mostowski Collapse

Jul 17, 2021, 9:08:35 AM7/17/21
Ok I got a prototype running for Dogelog. It now
produces the following:

/* Dogelog Runtime */
:- op(400,xfy,***).
:- op(400,fy,***).
:- X = (a***b)*c, write(X), nl.
:- X = a***b*c, write(X), nl.
:- X = (***b)*c, write(X), nl.
:- X = ***b*c, write(X), nl.

Will upload it later. Was also testing the prefix case in
TauProlog and found a bug:

/* TauProlog */
?- X = (***b)*c.
X = *** b*c.
?- X = (***b)*c, Y = *** b*c, X = Y.

So it shows me something written, which I cannot re-read
to get something structurally equivalent.

The fy yfx case doesn't write correctly, so that it can be re-read

Mostowski Collapse

Jul 17, 2021, 9:10:04 AM7/17/21
On the other hand, the new GNU Prolog seems to work
like ever before. Everything seems to be fine:

/* GNU Prolog 1.5.0 (64 bits) */
?- X = (***b)*c.
X = (***b)*c

Mostowski Collapse

Jul 17, 2021, 12:09:39 PM7/17/21
The closest to what I am testing is possibly Ulrich Neumerkels
#152 and #153. But this tests only the writing of fy yfx, on not
the writing of xfy yfx. I don’t find tests for the later.

Conformity Testing I: Syntax
Conformity Testing I: Syntax

Interestingly the TauProlog bug is #152 and #153. One can
verify that we now have as expected output:

:- op(9,fy,fy).
:- op(9,yfx,yfx).
:- writeq(fy(yfx(1,2))), nl.
:- writeq(yfx(fy(1),2)), nl.
fy 1 yfx 2
(fy 1)yfx 2

But for Tau Prolog I get:

fy (1 yfx 2)
fy 1 yfx 2

Mostowski Collapse

Jul 17, 2021, 12:13:41 PM7/17/21
The solution is rather trivial, but can be annoying to implement. Whether
this has something todo with the ISO core standard, I doubt. The ISO
core standard didn’t invent Prolog operator mechanism, it only roughly

documented it decades later. We only require:

/* re-read what was written */
read(writeq(T)) = T

The solution à la GNU Prolog is to extend a needs parenthesis rule:

% write_rparen(+Stream, +WritePrio, +OperPrio)
write_rparen(S, L, R) :- L < R, !,
put_code(S, 0')).
write_rparen(_, _, _).

Into a little more complicated needs parenthesis rule:

% write_rparen(+Stream, +OperAssocR, +WritePrio, +OperPrio, +CtxtAssocL)
write_rparen(S, _, L, R, _) :- L < R, !,
put_code(S, 0')).
write_rparen(S, 0, L, L, F) :- F/\8 =\= 0, !,
put_code(S, 0')).
write_rparen(_, _, _, _, _).

Since Dogelog Runtime has write_term/2 and read_term/2 completely
written in Prolog itself, you can lookup how this embeds into a
write_term/2 on GitHub. Its open source:

Open Source: Dogelog Runtime

Mostowski Collapse

Jul 20, 2021, 8:19:43 AM7/20/21
Now I am busy with completely aligning the Jekejeke Prolog
Unicode classification and the Dogelog runtime Unicode
classification. We have to solve the riddle why Jekejeke Prolog

uses invalid classifcation somewhere? This was a long time
ago, so memory is weak. We didn't yet align invalid, but we found
that the Jekejeke Prolog code types also support is_other/1:


This is then used in is_name/1. We could upgrade Dogelog runtime, the
new is_name/1 classification now allows the following input. A similar
input also works for SWI-Prolog:

:- X = åström, atom_codes(X, L), write(L), nl.
:- X = åström, atom_codes(X, L), write(L), nl.
[229, 115, 116, 114, 246, 109]
[97, 778, 115, 116, 114, 111, 776, 109]

Mostowski Collapse

Jul 23, 2021, 9:00:03 AM7/23/21
Dogelog runtime currently quaranteens the '.'
operator. Unlike Jekejeke Prolog which had this
solution for the '.' operator:

:- op(200, xfy, '.').
:- set_oper_property(infix('.'), sys_alias(sys_dot)).
:- op(200, xfy, sys_dot).
:- set_oper_property(infix(sys_dot), sys_portray('.')).

But we have now adopted most of the Dogelog runtime
parsing and unparsing strategies also in Jekejeke Prolog,
we now get in Jekejeke Prolog:

?- X =
X = foo'.'bar

The old expected output is We should find a
heuristic to make writing of the '.' operator more compact.
Picat seems to use the '.' operator for message sending.

How do they solve it?

Mostowski Collapse schrieb:

Mostowski Collapse

Jul 23, 2021, 9:09:08 AM7/23/21
Some crucial problems are number aliasing or symbol aliasing.
The following test cases would go wrong without the quotes.
We now find for example:

?- X = (1).(2).
X = 1'.'2

?- X = (===).(===).
X = ==='.'===

These test case would go wrong also via safe_space/3, since
inserting a space would turn the '.' operator into a terminating
period. And not inserting a space gives a wrong result.

So maybe an approach would be inserting parenthesis?

Mostowski Collapse

Jul 23, 2021, 9:10:24 AM7/23/21
The same test cases go wrong in SWI-Prolog:

/* SWI-Prolog 8.3.26 */
?- read(X), write_canonical(X), nl.
|: (1).(2).
X = 1.2.

?- read(X), write_canonical(X), nl.
|: (===).(===).
X = === . === .

Re-reading 1.2. or === . === . gives a different result
than the Prolog term that was written.

Mostowski Collapse

Jul 23, 2021, 10:19:01 AM7/23/21
Now we extended the Dogelog runtime parser by character
constants. The following test cases now work:

:- X = 0'a, write(X), nl.
:- X = 0'\n, write(X), nl.
:- X = 0''', write(X), nl.
:- X = 0'', write(X), nl.
error(syntax_error(doubling_missing), [])

But there is a greate discrepancy in existing Prolog systems.
The ISO core standard requires the above behaviour. I picked
4 Prolog systems, non of them implemented the above behaviour.

SWI-Prolog, TauProlog:
The accept both 0''' and 0'', so no error is thrown for
the missing doubling.

The acccept only 0'', so no error is thrown for the missing
doubling, and worst of all the ISO core standard compliant
character constant is not accepted.

Here are the irritating results from YAP and ECLiPSe Prolog:

ECLiPSe Version 7.0 #54 (x86_64_nt)
[eclipse 1]: X = 0''' .
syntax error: postfix/infix operator expected
| '.
| ^ here

YAP 6.3.3 (i686-mingw32)
?- X = 0''', write(X), nl.
X = 39
<==== HERE ====>
, write(X), nl.


Mostowski Collapse

Jul 24, 2021, 5:43:56 PM7/24/21
Thats an interesting find. Most Prolog system
don't implement the ISO terminating period. For
example TauProlog 0.3.1 (beta) accepts this Prolog text:


When I run a query I get:

?- foo(X), write(X), nl, fail; true.

On the other hand ECLiPSe Prolog doesn't accept it,
similarly GNU Prolog, it also emits the operator error:

[eclipse 1]: [user].
syntax error: postfix/infix operator expected
| foo(bar).foo(baz).
| ^ here

SWI-Prolog accepts it, but barks otherwise:

?- [user].
|: foo(bar).foo(baz).

ERROR: user://1:8:
ERROR: Arguments are not sufficiently instantiated

YAP Prolog silently swallows it:

?- [user].
% consulting user_output...
| foo(bar).foo(baz).
% consulted user_output in module user, 15 msec 0 bytes
?- foo(X), write(X), nl, fail; true.
EXISTENCE ERROR- procedure foo/1 is undefined, called from context
Goal was user:foo(_131502)

Mostowski Collapse

Jul 24, 2021, 5:49:00 PM7/24/21
But the TauProlog behaviour leads me to a
new implementation of the period, that I have
never done before.

The Tokenizer would recognize ., but not consume
it. This is different from the ISO core specification
that says recognize and consume ., and then

recognize white or % line comment without the
eoln or eof. To only recognize white or % line
comment allows implementation without the

requirement that the input stream allows push
back. The new approach that only recognizes .
does also not require an input stream that allows

push back, but we will need peek code.

Mostowski Collapse schrieb:

Mostowski Collapse

Jul 24, 2021, 5:50:20 PM7/24/21
I will try this new behaviour with Dogelog
runtime, and later try to bring it to
Jekejeke Prolog.

Currently Dogelog runtime implements the ISO
core specific behaviour:

% read_optional(-Term, +Quad, -Quad)
read_optional(end_of_file) --> current_token(end_of_file), !.
read_optional(X) --> read(X,1200), read_end('.'), reach_notoken.

% reach_notoken
reach_notoken --> reach_code(C), {is_white(C)}, !.
reach_notoken --> reach_code(0'%), !, next_code, skip_token_line.
reach_notoken --> reach_code(-1), !.
reach_notoken --> {throw(error(syntax_error(superflous_token),_))}.

% read_end(+Atom, +Quad, -Quad)
read_end(A) --> current_token(A), !.
read_end('.') --> !, {throw(error(syntax_error(end_of_clause_expected),_))}.

But reach_notoken will go down the drain,
when the tokenizer does handle '.' the new
way. Only recogizing it, not consuming it.

Mostowski Collapse schrieb:

Mostowski Collapse

Jul 24, 2021, 6:04:49 PM7/24/21
One advantage of the ISO core standard terminating
period, over the TauProlog period, is that the
ISO core standard terinating period is a little

more terminal friendly. Although this is also
not 100% true. For example a terminal input
such as for example:

?- X = 123.___

With 3 spaces ___ after the period can also
profit from additional skipping after the
period, to synchronize the terminal.

In TauProlog sandbox, which has a input field
web interface, such problems are not seen. But
these problems were solved in Jekejeke Prolog

by some additional skipping and are probably
also solved in other Prolog systems with a
terminal this way or some other way.

Mostowski Collapse schrieb:

Mostowski Collapse

Jul 28, 2021, 5:38:37 PM7/28/21
Looks like we misinterpreted how arithmetic comparison
works in JavaScript. Its not possible to use (==) from JavaScript
and (<) from JavaScript directly for Prolog (=:=) and Prolog (<).

The last two test cases are wrong:

:- 1 == 1.0, write(yes), nl; write(no), nl.
:- 1 =:= 1.0, write(yes), nl; write(no), nl.
:- 30000000000000000000000000 < 30000000000000000000000000.1,
write(yes), nl; write(no), nl.

The explanation is that JavaScript promotes to BigInt if one
of the arguments is BigInt. But the Prolog semantics would be
rather to promote to float when one of the arguments is float.

Here is how the last test case can be explained:

console.log(30000000000000000000000000n < BigInt(30000000000000000000000000.1));
> true
So we need to fix that as well.

Mostowski Collapse

Jul 28, 2021, 5:45:38 PM7/28/21
We could fix the arithmetic comparison.
The results are now correct:

:- 1 == 1.0, write(yes), nl; write(no), nl.
:- 1 =:= 1.0, write(yes), nl; write(no), nl.
:- 30000000000000000000000000 < 30000000000000000000000000.1,
write(yes), nl; write(no), nl.

But not all Prolog systems agree. The Prolog systems that agree
is for example Jekejeke Prolog. In Jekejeke Prolog I find:

/* Jekejeke Prolog 1.5.1 */
?- 30000000000000000000000000 < 30000000000000000000000000.1.

But SWI-Prolog and ECLiPSe Prolog do not agree:

/* SWI-Prolog 8.3.26 */
?- 30000000000000000000000000 < 30000000000000000000000000.1.
/* ECLiPSe Version 7.0 #54 */
[eclipse 2]: 30000000000000000000000000 < 30000000000000000000000000.1.
Yes (0.00s cpu)

Mostowski Collapse

Jul 28, 2021, 5:50:37 PM7/28/21
The discrepancy in SWI-Prolog and ECLiPSe Prolog is
due to low fidelity float/1 conversion. One can try the
following in SWI-Prolog and ECLiPSe Prolog:

/* SWI-Prolog 8.3.26 */
?- X is float(30000000000000000000000000).
X = 2.9999999999999996e+25.
?- X is integer(float(30000000000000000000000000))-30000000000000000000000000.
X = -3724541952.
/* ECLiPSe Version 7.0 #54 */
[eclipse 1]: X is float(30000000000000000000000000).
X = 2.9999999999999996e+25

So the propose to approximate the integer value by
the float 2.9999999999999996e+25, which is -3724541952
away from the integer.

Lets see what JavaScript does, it proposes a different
float number, which is closer to the integer. abs(-3724541952)
is larger than abs(570425344):

> 570425344n

So JavaScript has high fidelity float/1, like Jekejeke Prolog.
This explains the different outcome in Prolog (<). Dogelog
runtime now uses the JavaScript high fidelity float/1 as

well and therefore has the same outcome as Jekejeke Prolog.

Mostowski Collapse

Jul 28, 2021, 5:52:52 PM7/28/21
Disclaimer: Tested only Windows Chrome. Dunno what
Safari, Edge, etc.. do. Or what happens on Mac or Linux.

Mostowski Collapse

Jul 28, 2021, 6:51:48 PM7/28/21
The results are now:
:- 1 == 1.0, write(yes), nl; write(no), nl.
:- 1 =:= 1.0, write(yes), nl; write(no), nl.
:- 30000000000000000000000000 < 30000000000000000000000000.1,
write(yes), nl; write(no), nl.

Mostowski Collapse

Jul 29, 2021, 7:46:31 AM7/29/21
I just tried:

Picat 3.1, (C), 2013-2021.
Picat> X is float(30000000000000000000000000).
*** Undefined procedure: float/1

So float/1 doesn't exist. What am I supposed to do?
This here seems to be a work around though?

Picat> X is 30000000000000000000000000+0.0.
X = 30000000000000000570425344.0

It is even correct! (Tested on Mac)

Mostowski Collapse

Jul 29, 2021, 7:47:49 AM7/29/21
But I guess it would be legit to show it automatically
in scientific form, since there is no ambiguity in re-read
when shown as follows (Dogelog runtime output):

:- X is 30000000000000000000000000+0.0, write(X), nl.

Yes, such an output would also work in Picat:

Picat> X is 3.0E25.
X = 30000000000000000570425344.0

Its the same float.

Mostowski Collapse

Jul 29, 2021, 7:56:44 AM7/29/21
TauProlog 0.3.1 (beta)

?- number_chars(X,"30000000000000000000000000").
X = 3.
?- X is 30000000000000000000000000.
X = 3.

Was rather expecting an overflow error, because of the flag value:

?- current_prolog_flag(bounded, X).
X = true

Mostowski Collapse

Jul 29, 2021, 8:14:42 AM7/29/21
Even Python disagrees:

/* Python: */
>>> int(float(30000000000000000000000000))

/* SWI-Prolog: */
?- X is integer(float(30000000000000000000000000)).
X = 29999999999999996275458048

Mostowski Collapse

Jul 29, 2021, 2:52:35 PM7/29/21
Maybe I should do more testing, JavaScript describes the method to find a
float as HALFEVEN. I guess this method applied when one calls the conversion
Number(), i.e. the constructor without the new keyword, which is the float/1 evaluable
function equivalent of JavaScript. The conversion Number() can be also
called for bigint arguments. So JavaScript has float/1 from a bigint like
Prolog does have. So how is this conversion described:

" In this specification, the phrase “the Number value for x” where x
represents an exact real mathematical quantity (which might even be
an irrational number such as π) means a Number value chosen in the
following manner. Consider the set of all finite values of the Number type,
with -0𝔽 removed and with two additional values added to it that are not
representable in the Number type, namely 2^1024 (which is +1 × 253 × 2971)
and -2^1024 (which is -1 × 253 × 2971). Choose the member of this set that
is closest in value to x. If two values of the set are equally close, then the
one with an even significand is chosen; for this purpose, the two extra values
2^1024 and -2^1024 are considered to have even significands. Finally, if 2^1024
was chosen, replace it with +∞𝔽; if -21024 was chosen, replace it with -∞𝔽;
if +0𝔽 was chosen, replace it with -0𝔽 if and only if x < 0; any other chosen
value is used unchanged. The result is the Number value for x. (This procedure
corresponds exactly to the behaviour of the IEEE 754-2019 roundTiesToEven mode.)"

I am not sure whether I tested this already. The current test doesn’t test a tie.
Creating a tie test case is a tick more work. Also the above describes a
choice of negative zero. I should convert negative zero into positive zero
in my system, since I do not want to support that, its currently not in the
ISO core standard. But SWI-Prolog might support it:

/* SWI-Prolog 8.3.26 */
?- X is -0.0.
X = -0.0.

Mostowski Collapse

Jul 29, 2021, 2:54:15 PM7/29/21
Ok there is no danger of negative zero, since a funny side effect
of mapping small floats to bigint. I have to do that otherwise I cannot
distinguish them from small int, and small float and small int would
both have the same type. Currently result is (Dogelog runtime):

:- X is -1.0*0.0, write(X), nl.
:- 0.0 == -0.0, write(yes), nl; write(no), nl.

Mostowski Collapse

Jul 30, 2021, 9:51:28 AM7/30/21
Oho, Scryer Prolog (v0.8.127) isn't that lucky either. I get:

?- X is float(30000000000000000000000000).
X = 29999999999999996000000000.0.
?- 30000000000000000000000000 < 30000000000000000000000000.1.

Expectation would be rather a different float number,
and subsequently the comparison false.

Mostowski Collapse

Jun 24, 2022, 4:24:47 AM6/24/22
Now there are some interesting news, we might soon
see SWI-Prolog performing the following feat:

?- A = ४२.
A = 42.

This works already for a while in Dogelog, although
it had a bug, which was only fixed yesterday.

You can try:

Mostowski Collapse

Jun 24, 2022, 4:32:13 AM6/24/22
One could further extend it to Hex. Like for example:

?- A = 0xBABE.
A = 47806.

Works in Dogelog Player and formerly Jekejeke Prolog.
Again I only did it because Java can already do it.

Take this Java code:

int val = Integer.parseInt("BABE", 16);

It gives me:


Mostowski Collapse

Jun 24, 2022, 5:31:11 AM6/24/22
I could also imagine that this feature is banned from
consulting Prolog texts or reading Prolog terms, and
only works for number_codes/2. But somehow there is

a conflict in the ISO Prolog core standard, number_codes/2
refers to the Prolog text syntax. In Java the feature is also
banned from Java texts. This here doesn’t compile:

int val = 0xBABE;

I get this Java compiler error:

java: illegal non-ASCII digit

So its only Integer.parseInt() thingy and not a Java texts
thingy. But I don’t see for number_codes/2 that the ISO
Prolog core standard would be aware of such a difference.

So at the moment its easier to make a solution that conflates
the two, Prolog texts and some built-ins like number_codes/2.

Mostowski Collapse

Jun 25, 2022, 5:22:01 AM6/25/22
Now I am picking up new vibes for Dogelog Player, its still
a young Prolog system, and the streams are also still pretty
primitive. Maybe some new ideas might even spill back

to formerly Jekejeke Prolog. So whats on the todo list next?
I find that the following goodies concerning open/4 are missing,
its even the case that we even only have an open/3 so far:

- bom(Bool)
Specify detecting or writing a BOM.
- encoding(Atom)
Specify a file encoding.

BTW: This is an interesting read:

> Microsoft compilers and interpreters, and many pieces of software
> on Microsoft Windows such as Notepad treat the BOM as a required
> magic number rather than use heuristics. These tools add a BOM when
> saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is
> present or the file contains only ASCII.

So you might find a BOM also for UTF-8 files, not only UTF-16 and other
encodings. The usual heuristic is to choose UTF-8 if there is no BOM, but
the above suggests to throw an error if there is no BOM

and some non-ASCII, i.e. > 7 bit.

Mostowski Collapse

Jun 25, 2022, 5:26:11 AM6/25/22
Don’t know if any Prolog system throws such an error. One could
make the input stream aware that there was no BOM, and then bark
on a > 7 bit character coming from the stream. Could give more security.

Makes me currious to check what Java, JavaScript and Python do,
whether they have some reader object with such a feature. In the
worst case one could realize this feature in the Prolog systems

streams itself. Again bom(Bool) is not mentioned in the ISO Prolog
core standard, similar to encoding(Atom), but it has already some
support across Prolog systems. For example in formerly Jekejeke

Prolog I have the same, and SWI-Prolog has it as well. Not sure
what newer Prolog systems such as Scryer Prolog provide.
Need to check.

Mostowski Collapse

Jan 30, 2023, 11:17:32 AM1/30/23
Q: Why did Prolog miss the OpenAI/ChatGPT bandwagon?

A: Its in the genetics, Prolog is developing backwards:

1) Quintus: library(charsio)

2) SICStus: Na, library(codesio), characters that are atoms
of length 1 could pollute the atom table.

3.1) Scryer Prolog: Na, library(charsio), we have to go backwards,
create a vintage Prolog.

Mostowski Collapse

Jan 30, 2023, 11:18:56 AM1/30/23
Meanwhile I am seriously considering renaming my
library(charsio) into library(atomsio), since that is
more what it does.

3.2) Dogelog Player: Na, library(atomsio),
what atom table?

Mostowski Collapse

Mar 16, 2023, 2:54:02 PM3/16/23
Just notice I made a big mistake, or maybe not? I used the following
syntax for reference data type in Prolog:

reference :== "0" "r" name .

Isn’t this in conflict to rational numbers syntax:

rational :== integer "r" integer .

Not really a rational number would have a digit after “r”, whereas
a reference data type wouldn’t have a digit after the “r”.
Now I can input output the beasts on JavaScript and Python,

and even sorting them now works:

?- sort([1,0rFalse,3.14,0rNone], L).
L = [3.14, 1, 0rNone, 0rFalse].

?- compound(0rTrue).

?- reference(0rTrue).

Mostowski Collapse

Mar 16, 2023, 2:55:46 PM3/16/23
Will also bring them to Java. Although JPL proposes @(null),
@(false) and @(true)? But the advantage of the 0r syntax,
it gives a data type, which is not a compound.

Motivation to introduce these constants. JavaScript wanted
me a boolean attribute value for the attribute name “disabled”
on a DOM element. But I guess another application area

would be JSON parsing and unparsing.

Mild Shock

Jul 31, 2023, 4:00:48 AM7/31/23
Note: Do not confuse multillingual strings, as introduced
by recent Dogelog Player, with Unicode encoding.
I am using the phrase "multilingual strings" for a text

database. Interestingly I managed to make the text
database declarative. Means the strings/3 entries
are not order dependent. You can place languages into
the multifile predicate in any order,

and it will pick the most specific string independent
of the order of the strings/3 facts. I was replicating
the Java Script Resource Bundle lookup on a finer
grained level, by this simple Prolog code:

get_string(Key, Locale, Value) :-
sys_locale_ancestor(Locale, Parent),
strings(Key, Parent, Res), !,
Value = Res.

% sys_locale_ancestor(+Atom, -Atom)
sys_locale_ancestor(L, L).
sys_locale_ancestor(L, M) :-
last_sub_atom(L, P, _, _, '_'),
sub_atom(L, 0, P, _, M).
sys_locale_ancestor(_, '').

The above assumes that locale identifiers use
underscore separator. It also assumes the last_sub_atom/3
predicate from Novacore, unfortunatel ISO Core has
only sub_atom/3, but no last_sub_atom/3.

Mild Shock

Jul 31, 2023, 4:06:18 AM7/31/23
For locales, see also:

Creating global software: Text handling and localization
in Taligent’s CommonPoint application system
Mark Davis et al. - 1996

I wasn't sure whether the get_string/3 implementation
will perform. In Jekejeke Prolog, since it has
multi-argument indexing, each string/3 lookup with

mode (+,+,-) will be a little bit faster, whereas
Doglog Player, which has only first argument indexing
will use a little bit more time, since it needs

a scan, can only lookup the Key via an index,
but will scan for the Parent. If the text database
isn't extremly large and/or if the locales do

not have extremly many segments, this is a
non-issue I guess. And Dogelog Player might get
multi-argument indexing in the future.

Mild Shock schrieb:
0 new messages