Development snapshots available again, with UTF-8 support

13 views
Skip to first unread message

Thomas Nilefalk

unread,
Jun 4, 2021, 9:03:35 AM6/4/21
to ala...@googlegroups.com
Hi Everyone!

I'm happy to announce that the Alan Continuous Integration pipeline has been rebuilt and does now continuously deliver new builds of command line SDK:s for Linux and Windows, as well as the Windows installations for WinArun and the Alan SDK. They can as per usual be downloaded from https://www.alanif.se/download-alan-v3/development-snapshots but also directly from the CI at https://ci.alanif.se.

The latest alpha/snapshot contains substantial improvements for those of you that work with non-ASCII languages as Alan now supports UTF-8 which is the predominant encoding on most systems of today. The advantage of this is that you (normally) don't have to do anything special to get Alan to correctly interpret your ñ, ä, ß and other characters. This has previously required special setup of editors and consoles to get to work right.

You can read more about the function in the alpha-level documentation Compiler Switches ('-encoding UTF-8') and Interpreter Switches ('-u').

For the upcoming beta8 you will explicitly have to tell the Alan compiler that the source files are in UTF-8, but for beta9 this will be the default. If your text file happens to be encoded with UTF-8 with a "BOM" (a special indicator in the file) Alan will already automatically respect that. (Some environments add this "marker", some don't).

Also the interpreter accepts a UTF-8 option which of course controls command line input and output for the command line interpreters. But it also is useful for the GLK-based interpreters (WinArun, Gargoyle, ...) as logs and canned command input will have to be read with the correct encoding.

So after beta9 (the beta release after the upcoming one), Alan authoring and running will be more natural for us who work in non-English lingua.

/Thomas

Tristano Ajmone

unread,
Jun 4, 2021, 9:34:12 PM6/4/21
to Thomas Nilefalk
Ciao Thomas!

Sorry for the late reply, I was caught up in a work I had to finish fast and didn't check my
emails in the meanwhile!

> Hi Everyone!

> I'm happy to announce that the Alan Continuous Integration pipeline has been rebuilt and does now continuously deliver new builds of command line SDK:s for Linux and Windows, as well as the Windows installations for WinArun and the Alan SDK. They can as per usual be downloaded from https://www.alanif.se/download-alan-v3/development-snapshots but also directly from the CI at https://ci.alanif.se.

That's great news. I'll definitely look in the latest Alpha SDKs this weekend, and give you some feedback.

> The latest alpha/snapshot contains substantial improvements for those of you that work with non-ASCII languages as Alan now supports UTF-8 which is the predominant encoding on most systems of today. The advantage of this is that you (normally) don't have to do anything special to get Alan to correctly interpret your ñ, ä, ß and other characters. This has previously required special setup of editors and consoles to get to work right.

> You can read more about the function in the alpha-level documentation Compiler Switches ('-encoding UTF-8') and Interpreter Switches ('-u').

> For the upcoming beta8 you will explicitly have to tell the Alan compiler that the source files are in UTF-8, but for beta9 this will be the default. If your text file happens to be encoded with UTF-8 with a "BOM" (a special indicator in the file) Alan will already automatically respect that. (Some environments add this "marker", some don't).

I was really looking forward to this. I'm planning to start working on using UTF-8 immediately in the StdLib
development branch. It might take some time and testing on dev-dev branch, for I'll have to adapt all the
various scripts accordingly, but it's going to be a good test bed since the test suite is quite big and the
project uses Asciidoctor inclusion of tagged regions from the library and examples' sources, as well as
dynamically generated transcripts. The switch to UTF-8 is going to improve build times, and also disentangle
all the various encoding conversion operations, which add complexity.

> Also the interpreter accepts a UTF-8 option which of course controls command line input and output for the command line interpreters. But it also is useful for the GLK-based interpreters (WinArun, Gargoyle, ...) as logs and canned command input will have to be read with the correct encoding.

I'll definitely be testing the CLI functionality, since the StdLib repo automates all the tests and transcripts.
With all the EditorConfig checks in place, and Asciidoctor also validating externally included files and snippets,
any issue would be spotted during the shift to UTF-8.

> So after beta9 (the beta release after the upcoming one), Alan authoring and running will be more natural for us who work in non-English lingua.

I'll be updating Sublime Alan accordingly, making UTF-8 the default (I don't think Sublime Text can be made to handle
multiple encodings without trouble, so I'll just drop ISO support in the package). The new UTF-8 SDK comes at the
right time, since I've updated to Sublime Text 4 a couple of weeks ago, so I'll just make the package work with the
newer ST4 only (it's worth it, the new editor and syntax features are quite powerful).

As usual, I can't thank you enough for all your hard work.

Best regards,

Tristano Ajmone

PS: I've began studying the Ruby language, since all the ALAN projects rely on Asciidoctor for their documentation.
I always knew (from past brushes) that Ruby was a beautiful language, but now that I've started working on the
Rouge syntax for ALAN, and read some books on Ruby, I'm starting to understand why this language was so successful.
I'm usually not a great fan of scripted languages, especially those that update too often (like Node JS) with the
risk of braking packages; but I have to admit that Ruby shines — it's not scripts for the sake of laziness; it's
scripting in all its glory. It's like Python, but with more freedom of expression (Ruby allows multiple ways to
do the same thing, and doesn't have benevolent dictators).

> /Thomas



Tristano Ajmone (Italy)

Tristano Ajmone

unread,
Jun 9, 2021, 7:05:07 PM6/9/21
to Thomas Nilefalk
Ciao @Thomas,

> Hi Everyone!

> I'm happy to announce that the Alan Continuous Integration pipeline has been rebuilt and does now continuously deliver new builds of command line SDK:s for Linux and Windows, as well as the Windows installations for WinArun and the Alan SDK. They can as per usual be downloaded from https://www.alanif.se/download-alan-v3/development-snapshots but also directly from the CI at https://ci.alanif.se.


I apologize for the feedback delay, but the weekend didn't roll out as planned, and I didn't manage to check the new SDK until tonight.

So, first things first, I can confirm that the Alpha SDK for Windows seems to work fine (i.e. the previous DLL related errors are now gone).
I didn't yet try to install WinARun, for the reasons explained below...

> The latest alpha/snapshot contains substantial improvements for those of you that work with non-ASCII languages as Alan now supports UTF-8 which is the predominant encoding on most systems of today. The advantage of this is that you (normally) don't have to do anything special to get Alan to correctly interpret your ñ, ä, ß and other characters. This has previously required special setup of editors and consoles to get to work right.


Now I'm trying to work out how to setup a dev branch for the StdLib were I can test switching the whole project to UTF-8 sources.
My main concern right now is how I'm going to handle this in my editor, Sublime Text, for which I've created the Sublime Alan
package. I'm trying to figure out how to tweak the package so it will default to UTF-8 encoded Alan sources (and transcripts,
solutions) but be able to fallback on ISO encoding if old project files are opened.

Since ST in most cases won't be able to determine if an English adventure is encoded in UTF-8 or ISO, due to the text containing
only chars from the ASCII range, all I can safely rely on is the presence of a BOM in the UTF-8 source.

I remember discussing the BOM on some repository Issue or Discussion, but can't remember where it was and how it ultimately
rolled out. Is an UTF-8 BOM in ALAN sources now allowed, mandatory or not allowed?

If adding a BOM to UTF8 source is permitted, my best option is to use that to direct Sublime Alan on switching encoding, which
would mean that I could still leave ISO as the default encoding, and let ST auto-switch to UTF8 when a BOM is found.

In case of sources/transcripts/solutions which are the same in both encodings, end users will have to manually switch to UTF8
if they want to start working in that encoding — and if the BOM is supported, switch to UTF8 with BOM.

How is the AlanIDE going to approach this new feature?

Right now I'm unsure on my next steps, because on the one hand I want to be able to work with both old project in ISO and
newer one in UTF8, without going bonkers. On the other hand, I would prefer not to update the Sublime Alan package to
adopt UTF-8 as the default Alan encoding until the next Beta is out — i.e. I would implement these changes in a local
dev branch of the package, and not push them on the Sublime Alan repository; but I would also like to plan how to roll
out this feature.

What's your advise in this respect?

I believe that right now attempting to switch the whole StLib repo to UTF-8, in a dedicated branch, is an essential
step for testing the new feature, and for understanding how editor plugins for Alan should approach the upcoming
switch to UTF8 encoding as the default, in Beta9.

Another hindrance that has delayed my experimentation with StdLib UTF8 is that I'll now need to add to the repository
a script that should determine which SDK version to use based on the branch, and invoke it with the required options
to enable UTF8 (until it become the default). Since the master branch will keep using an older version of the SDK
(i.e. the latest Beta that was out when the StdLib was last released), having a toolchain that is branch-aware now
becomes mandatory, and relying on the ALAN binaries on the Sys Path might no longer be a viable solution in this
transition period from one encoding to the other.

Thanks

Tristano

> You can read more about the function in the alpha-level documentation Compiler Switches ('-encoding UTF-8') and Interpreter Switches ('-u').

> For the upcoming beta8 you will explicitly have to tell the Alan compiler that the source files are in UTF-8, but for beta9 this will be the default. If your text file happens to be encoded with UTF-8 with a "BOM" (a special indicator in the file) Alan will already automatically respect that. (Some environments add this "marker", some don't).

> Also the interpreter accepts a UTF-8 option which of course controls command line input and output for the command line interpreters. But it also is useful for the GLK-based interpreters (WinArun, Gargoyle, ...) as logs and canned command input will have to be read with the correct encoding.

> So after beta9 (the beta release after the upcoming one), Alan authoring and running will be more natural for us who work in non-English lingua.

> /Thomas



Tristano Ajmone (Italy)

Thomas Nilefalk

unread,
Jun 10, 2021, 3:55:15 AM6/10/21
to Tristano Ajmone, ala...@googlegroups.com


På 10 juni 2021 kl. 01:05:08, Tristano Ajmone (taj...@gmail.com) skrev:

Ciao @Thomas, 

> Hi Everyone! 

> I'm happy to announce that the Alan Continuous Integration pipeline has been rebuilt and does now continuously deliver new builds of command line SDK:s for Linux and Windows, as well as the Windows installations for WinArun and the Alan SDK. They can as per usual be downloaded from https://www.alanif.se/download-alan-v3/development-snapshots but also directly from the CI at https://ci.alanif.se. 


I apologize for the feedback delay, but the weekend didn't roll out as planned, and I didn't manage to check the new SDK until tonight. 

No need to apologize, we all do this on our spare time, in the time we have.


So, first things first, I can confirm that the Alpha SDK for Windows seems to work fine (i.e. the previous DLL related errors are now gone). 
I didn't yet try to install WinARun, for the reasons explained below... 

> The latest alpha/snapshot contains substantial improvements for those of you that work with non-ASCII languages as Alan now supports UTF-8 which is the predominant encoding on most systems of today. The advantage of this is that you (normally) don't have to do anything special to get Alan to correctly interpret your ñ, ä, ß and other characters. This has previously required special setup of editors and consoles to get to work right. 


Now I'm trying to work out how to setup a dev branch for the StdLib were I can test switching the whole project to UTF-8 sources. 
My main concern right now is how I'm going to handle this in my editor, Sublime Text, for which I've created the Sublime Alan 
package. I'm trying to figure out how to tweak the package so it will default to UTF-8 encoded Alan sources (and transcripts, 
solutions) but be able to fallback on ISO encoding if old project files are opened. 

Since ST in most cases won't be able to determine if an English adventure is encoded in UTF-8 or ISO, due to the text containing 
only chars from the ASCII range, all I can safely rely on is the presence of a BOM in the UTF-8 source. 

I remember discussing the BOM on some repository Issue or Discussion, but can't remember where it was and how it ultimately 
rolled out. Is an UTF-8 BOM in ALAN sources now allowed, mandatory or not allowed? 

At the time of that discussion it seemed to me that you knew a lot more than me on the subject, now it sounds like you have forgotten some of that knowledge as you transfered it to me ;-)

AFAIK, there is no difference between an ISO-encoded file and an UTF-8 encoded file if they only contain ASCII-range characters. So from that respect a file could be either and it does not matter. From an Alan compiler perspective it will not matter either. So you actually don't have to know if an ASCII-only file is UTF-8 or ISO, they are the same.

What I decided from that discussion in #12 was that the Alan compiler can be instructed to assume UTF-8, but even if not, a file which have the BOM will be converted (as described in A.3. Encodings and character sets in the alpha manual).


If adding a BOM to UTF8 source is permitted, my best option is to use that to direct Sublime Alan on switching encoding, which 
would mean that I could still leave ISO as the default encoding, and let ST auto-switch to UTF8 when a BOM is found. 

Yes, so given the above description, this is what I would suggest. It even allows gradual conversion, one file at the time while still having a completely working game. (I just realized that there is no test for cross-file references, say a location with non-ascii character in the identifier declared in one file with one encoding, and a reference to it using the same characters in another file with a the other encoding, I see no reason why that should not work, just a note to self to add that test...).

In case of sources/transcripts/solutions which are the same in both encodings, end users will have to manually switch to UTF8 
if they want to start working in that encoding — and if the BOM is supported, switch to UTF8 with BOM.

Users in ASCII-only environments don't actually have to do anything at all ever, I think. All files will be identical if they do not contain non-ASCII characters. And even after Alan has gone all-UTF, the files will still work and be the same.

The only snag might be if a particular environment/editor will open a file and assume some random (non-UTF-8) encoding if the BOM is missing, and then only if the user will then add non-ASCII characters to that file. But that is a rather unlikely scenario, I think.

(Another note to self: add check for BOM on "solution files" when they are opened, I don't know if opening a text file with a BOM will botch the command reading, but at least we could do an automatic encoding switch for that file as for source files.)


How is the AlanIDE going to approach this new feature?

AlanIDE has not been updated to address this, or even at all for a long time (so that is another project to take on...). I'm not even sure how the underlying Eclipse functions and editors handle UTF-8. But again the safe route (for a user) would be to use UTF-8 with BOM. I need to investigate this and also think about what needs to be done in terms of user experience here.


Right now I'm unsure on my next steps, because on the one hand I want to be able to work with both old project in ISO and 
newer one in UTF8, without going bonkers. On the other hand, I would prefer not to update the Sublime Alan package to 
adopt UTF-8 as the default Alan encoding until the next Beta is out — i.e. I would implement these changes in a local 
dev branch of the package, and not push them on the Sublime Alan repository; but I would also like to plan how to roll 
out this feature. 

What's your advise in this respect?

I also think that restricting the impact of this to after beta8 is important. I'm not exactly sure what the Sublime package does with the encoding. Would it make sense to let it

  • create new files with UTF-8 with BOM (given the automatic identification of single file encodings discussed above)
  • open existing files with the existing encoding and maybe at some point offer to change encoding to UTF-8 with BOM
I haven't explicitly thought about it like this but it boils down to
  • for beta8:
    • ensure use of UTF-8 with BOM for automatic identification
    • convert file by file
  • after beta9:
    • use the "normal" UTF-8 encoding of your environment/editor, both with and without BOM is fine
    • if you still have ISO-encoded source files in the project use "-encoding ISO" when compiling, which will work if the UTF-8 encoded files have a BOM
So, there's the conversion strategy I would recommend for users. I'm not sure how the Sublime package influences this for the StdLib.


I believe that right now attempting to switch the whole StLib repo to UTF-8, in a dedicated branch, is an essential 
step for testing the new feature, and for understanding how editor plugins for Alan should approach the upcoming 
switch to UTF8 encoding as the default, in Beta9.

A bulk conversion of the StdLib would be a valuable (repeatable) experiment, but I'm not sure there is value in keeping that as a separate branch. *I* think that gradual conversion would be a better route for the StdLib. But that then has to wait for beta8, so that's the drawback.


Another hindrance that has delayed my experimentation with StdLib UTF8 is that I'll now need to add to the repository 
a script that should determine which SDK version to use based on the branch, and invoke it with the required options 
to enable UTF8 (until it become the default). Since the master branch will keep using an older version of the SDK 
(i.e. the latest Beta that was out when the StdLib was last released), having a toolchain that is branch-aware now 
becomes mandatory, and relying on the ALAN binaries on the Sys Path might no longer be a viable solution in this 
transition period from one encoding to the other. 

Again, I *personally* would think this is no much effort for a temporary situation. I would, as outlined above, experiment with converting one or a couple of files...

WAIT, isnt' the StdLib in English!?!?!? Why is this a problem then? See the discussion about ASCII-only environments above.

This should only be a problem for the Italian library. Or is that what you are referring to?

If, so ... and after converting a few files and running all tests, take a step back, report any errors to me ;-) and we could discuss how to solve them, then reiterate until all non-ASCII files where in UTF-8.

Still a bit confused about how the configuration/function of the Sublime package fits into this. Maybe a user scenario would help me....

/Thomas



Thanks 

Tristano 

> You can read more about the function in the alpha-level documentation Compiler Switches ('-encoding UTF-8') and Interpreter Switches ('-u'). 

> For the upcoming beta8 you will explicitly have to tell the Alan compiler that the source files are in UTF-8, but for beta9 this will be the default. If your text file happens to be encoded with UTF-8 with a "BOM" (a special indicator in the file) Alan will already automatically respect that. (Some environments add this "marker", some don't). 

> Also the interpreter accepts a UTF-8 option which of course controls command line input and output for the command line interpreters. But it also is useful for the GLK-based interpreters (WinArun, Gargoyle, ...) as logs and canned command input will have to be read with the correct encoding. 

> So after beta9 (the beta release after the upcoming one), Alan authoring and running will be more natural for us who work in non-English lingua. 

> /Thomas 



Tristano Ajmone (Italy) 

-- 
You received this message because you are subscribed to the Google Groups "Alan IF discussions" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to alan-if+u...@googlegroups.com. 
To view this discussion on the web visit https://groups.google.com/d/msgid/alan-if/1107396598.20210610010504%40gmail.com. 
Reply all
Reply to author
Forward
0 new messages