Document Serialization

60 views
Skip to first unread message

Gilad Bracha

unread,
Oct 17, 2023, 5:26:07 PM10/17/23
to Newspeak Programming Language
Hello,

Milan asked for more details about document serialization.  Below are some thought on the matter. They are preliminary. I haven't implemented the scheme below, and so it is subject to change.

The context here is that I would like a solution that allows me to save and load documents, including their dependencies, which include resources like images (or audio or video media etc), classes they use in their amplets, and transcluded documents.

Moreover,  if we are to fulfill the vision of documents as full blown applications, we need to support scenarios where the app should run with specific data. For example, Telescreen is a presentation manager written in Newspeak, and one wants to save and  load not just the Telescreen application, but one or more specific presentations. The plan is to revise Telescreen to be itself a document.  Today the Telescreen app has its own scheme for saving presentation as zip files, and this does not cope with dependencies.

Now let me me explain how I've been looking at this problem.

We want a serialization solution that preserves a document as completely as possible. There is a spectrum here, from fully preserving document state, to preserving just document "source" with or without is dependencies.  Currently, we only do the latter.  If object serialization worked properly, it would subsume all of these -  - but I need something I can implement quickly.

As noted above, my focus here is on an intermediate case - preserving the document source, its dependencies, and the data needed to start it up. 

One perspective is that we have a player and content (think a media player and a audio/image/video file). The player is program, that may in fact have dependencies.
Thererfore,  we shift perspective and consider a program, its dependencies and the data it needs to start up. We can represent it as a main program (the entry point of which is standardized), a list of dependencies (programs again) and some data the program will read at launch. The dependencies may be named symbolically within the program.  This leads us to representation that is a map of names to code (which embeds names symbolically referencing code or programs) paired with data.

We can also view the program as an interpreter of its initial data. Especially if the program is indeed an interpreter, in which case the data or content is itself a program. In the case where the data itself is an interpreter, we have a recursion.

Now we can think about a document as a program. Its dependencies include resources, classes, but also transcluded documents. So we can use a map of named entities, that are either classes or documents or resources (media), paired with data (which may be empty). In the case of Telescreen, the data includes the Telescreen slides, each of which is itself a document, which may have its own dependencies such as classes or transcluded documents.

This is actually general enough to describe apps that are not documents, as these would consist of the application class and its dependencies - with empty data or not.

A delicate question we've glossed over is whether the map is flat or not. That is, when we name a dependency which is a program (say, a transcluded document) is it a recursive instance of the same structure, or are we sharing dependencies in the top level map? Obviously, sharing is important both to prevent redundancy and to ensure correct semantics. Yet one must account for namespacing (name conflicts); reusing the recursive structure is also elegant and attractive.

One idea is to identify and remove redundant dependencies when serializing.  This is what one would do with object serialization. Assuming we have a consistent IDE Root namespace when saving, we should be fine. If we load documents, we may get problems when a loaded document includes classes that are already in the system. In that case, we may must decide on a policy: load subsumes existing or vice versa.  The former is consistent with current practice. If the result is undesirable, one can load the correct piece I suppose.  Likewise with the latter - one can always load the desired version. Might be good to warn of overwrites and gave options.

Now we can consider how to realize these ideas. Extending the Telescreen scheme of zip files seems the most direct possibility. It can handle media files (unlike our object serializers) and is likely to efficient and easy to implement. It naturally consists of a dictionary of names, and we use conventions to handle the main entry point and the contents as a subfile.

This should be enough to make Ampleforth a a universal app-builder that allows you to construct apps as simple as pure text documents or as complex as Telescreen or more, and to save and load them as standalone docu-apps or in the context of the general editor.




Milan Zimmermann

unread,
Oct 18, 2023, 11:55:28 PM10/18/23
to newspeak...@googlegroups.com
Gilad,

Thanks for describing this path to saving and loading documents (that are also apps).  I am attempting to come to an understanding of the thoughts, and organizing them in my picture of the world.  That generated some scattered wandering which I captured inline.  It's mostly meta-talk, and meta-questions, feel free to ignore if it's hard to comprehend.  At the end there is a somewhat concrete section where I try to tie this to a micro-project I am working on.  

On Tue, Oct 17, 2023 at 5:26 PM Gilad Bracha <gbr...@gmail.com> wrote:
Hello,

Milan asked for more details about document serialization.  Below are some thought on the matter. They are preliminary. I haven't implemented the scheme below, and so it is subject to change.

The context here is that I would like a solution that allows me to save and load documents, including their dependencies, which include resources like images (or audio or video media etc), classes they use in their amplets, and transcluded documents. 

Moreover,  if we are to fulfill the vision of documents as full blown applications, we need to support scenarios where the app should run with specific data. For example, Telescreen is a presentation manager written in Newspeak, and one wants to save and  load not just the Telescreen application, but one or more specific presentations. The plan is to revise Telescreen to be itself a document.  Today the Telescreen app has its own scheme for saving presentation as zip files, and this does not cope with dependencies.

Now let me me explain how I've been looking at this problem.

We want a serialization solution that preserves a document as completely as possible. There is a spectrum here, from fully preserving document state, to preserving just document "source" with or without is dependencies.  Currently, we only do the latter. 

The Newspeak IDE (not using an "image" - like concept) stores only class code changes ("source") in the browser-local storage under "backup" and "lastSaved" keys. Is that what you mean by "we only do the latter"? Or do you also/instead mean that a serialized app (serialized in .vfuel) only stores "source" (class declarations)? 

[I thought that .vfuel is able to store serialized objects ... Subnote: although in my understanding an app can only serialize objects it can create in #packageUsing:manifest - because an object that needs platform for it's creation cannot be created in  #packageUsing:manifest, hence cannot be serialized ... but I feel I am missing something in all this.]
 
If object serialization worked properly,

I would be curious in more detail what is not working or why on a high level, but anyway...
 
it would subsume all of these -  - but I need something I can implement quickly.

Ok
 

As noted above, my focus here is on an intermediate case - preserving the document source, its dependencies, and the data needed to start it up. 

I have a question about the "data needed to start it up" section but will ask it below in context.
 

One perspective is that we have a player and content (think a media player and a audio/image/video file). The player is program, that may in fact have dependencies.

Yes

Thererfore,  we shift perspective and consider a program, its dependencies and the data it needs to start up. We can represent it as a main program (the entry point of which is standardized), a list of dependencies (programs again) and some data the program will read at launch. 
The dependencies may be named symbolically within the program.  This leads us to representation that is a map of names to code (which embeds names symbolically referencing code or programs) paired with data.

I understand this so far (I think) although still holding on to my "start up data" question.
 

We can also view the program as an interpreter of its initial data. Especially if the program is indeed an interpreter, in which case the data or content is itself a program. In the case where the data itself is an interpreter, we have a recursion.

Now we can think about a document as a program.

Basically, "document carries along the program that opens itself (the document)"? Sort of a "self-runnable document"?

Its dependencies include resources, classes, but also transcluded documents. So we can use a map of named entities, that are either classes or documents or resources (media), paired with data (which may be empty).

Four entity types in the document dependencies map: classes, other documents, resources, data. 
  - classes: I assume class sources
  - other documents: the recursive structure
  - resources: what makes them special compared to data, that resources are not needed at start up?
  - data
      - Would the "data" be the "start up data" you mentioned above? What would they be in a simplified example, perhaps settings such as background=dark/white? 
      - Or would it be "user data" (what we normally think of as the document, for example a Word document). Below you mentioned that "data" may include the Telescreen slides, which would make it "user data", the document. This confusion is surely just me, when lacking clarity of understanding, getting into speculations :)

In the case of Telescreen, the data includes the Telescreen slides, each of which is itself a document, which may have its own dependencies such as classes or transcluded documents.

This is actually general enough to describe apps that are not documents, as these would consist of the application class and its dependencies - with empty data or not.

By the above, do you mean that every document is an app (so a document must understand the convention method "#packageUsing: manifest"), and also, that there are apps which are not documents?  Would  the convention defining such "true document" / "docu-app" be the presence of the "dependencies map" (in addition to understanding "#packageUsing: manifest")?  (If I am right in the previous), is there an intended message name(s) related to the access of the "dependencies map"?


A delicate question we've glossed over is whether the map is flat or not. That is, when we name a dependency which is a program (say, a transcluded document) is it a recursive instance of the same structure, or are we sharing dependencies in the top level map? Obviously, sharing is important both to prevent redundancy and to ensure correct semantics. Yet one must account for namespacing (name conflicts); reusing the recursive structure is also elegant and attractive.

fwiw my uneducated feeling would be using the recursive structure would be nice, with expunging repetitions as you describe below.


One idea is to identify and remove redundant dependencies when serializing.  This is what one would do with object serialization. Assuming we have a consistent IDE Root namespace when saving, we should be fine. If we load documents, we may get problems when a loaded document includes classes that are already in the system. In that case, we may must decide on a policy: load subsumes existing or vice versa.  The former is consistent with current practice. If the result is undesirable, one can load the correct piece I suppose.  Likewise with the latter - one can always load the desired version. Might be good to warn of overwrites and gave options.

That is good.
 

Now we can consider how to realize these ideas. Extending the Telescreen scheme of zip files seems the most direct possibility. It can handle media files (unlike our object serializers) and is likely to efficient and easy to implement. It naturally consists of a dictionary of names, and we use conventions to handle the main entry point and the contents as a subfile.

This should be enough to make Ampleforth a a universal app-builder that allows you to construct apps as simple as pure text documents or as complex as Telescreen or more, and to save and load them as standalone docu-apps or in the context of the general editor.

 Sounds great. And docu-apps seems like a precise name for those artifacts - did the Ministry approve? :)

This post on serializing apps helps me in a blog micro-project I have started last week: I am building a simple blog (extremely simple), with 3 goals: a) use the Newspeak IDE to actually build something simple but useful b) get a hands on with Hopscotch(ForHTML5) - which has been a delight but slow progress having to get each answer from the code of the single example I know, the Newspeak IDE :)  c) hopefully use the resulting blog myself - eventually.

It looks as primitive as this:

image.png

I am only describing the above to give some context.  

The App has two modules, SimpleBlog, and SimpleBlogApp (which allows the blog to run from the IDE, and eventually distribute as a vfuel). Currently, I run the blog app from the IDE; In the blog UI, I can create several Post instances (for now, a Post instance only has a String title and String content), and select a current post by clicking on the posts list.  

Of course, the Post instances only live as long as I run the App in the IDE. Clicking the IDE Save button, or running "deploy" does not save the instances. This is understood, but I am looking for solutions of how to save the Post instances. 

If I were doing this in (say) Squeak, everything could be serialized in the image, including the Post instances; The same image could serve the blog on HTTP, and I could be adding a Post instance from a UI, or a workspace and save the image; clients would see the newly added post on reload or something similar.

So my core but high level question would be: What are my potential solutions in Newspeak for saving instances of Posts (and presenting them to a user when the blog is visited)? (Streaming then reloading the Post title and contents as a file or database is not straightforward, and probably violates some side effects principles as well - and I do not want to do that for a simple blog anyway.)

So I hope the described future serialization of the docu-apps could help me in saving and restoring the Post instances, although I am not sure about the mechanism and details, for example:

   - Could the Post instances somehow become the "start up data" you were describing? (and would be serialized/deserialized with the App)
   - Or (more likely) the Post instances would actually be the "dependencies  documents" in the recursive structure?
   - Or (even more likely) something entirely different and obvious :)

I made this way too long again. In case you read this end part, no need for detailed comments, but some high level comments would be helpful.

Thanks
Milan






--
You received this message because you are subscribed to the Google Groups "Newspeak Programming Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to newspeaklangua...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/newspeaklanguage/9d4becf7-3dc3-475f-a89b-6703c7ac9320n%40googlegroups.com.

Gilad Bracha

unread,
Oct 19, 2023, 12:35:04 AM10/19/23
to newspeak...@googlegroups.com
Milan,

I may not answer all your questions until the weekend. I'll point out one thing: I too want to make a blogging tool in Newspeak. Now the thing is, there isn't much difference between a blogging tool and Telescreen.  Each manages a series of documents - blogs in one case, slides in another. As another example, a book is a sequence of chapters rather than blog posts or slides. In short, my view is that Telescreen will be generalized to cover all these cases.  Orthogonally, I'll point out again, that Telescreen itself should ideally be defined as a document, rather than as a standalone app. 

When will all this happen? I don't know. 

It seems my earlier post was not clear enough on some points, especially what we mean by start-up data.  Think of Telescreen. If you want to share a presentation, you may often want to share it as a document (either by sending the document or by giving a URL) that opens up at the title slide of that specific presentation, as opposed to the pointing people at Telscreen itself and asking people to load a specific presentation.  So the  specific presentation in question is the start up data in this case.

THe same might be true of a blog post. You might want them to start reading the post, rather than opening the blogging tool and loading the post later. In fact , you probably want the tool to have all blogs loaded in order, but open a specific one. So the full set of blogs, plus the name/index of the specific post in question would be the start up data.

I hope this helps. More later.




--
Cheers, Gilad

Milan Zimmermann

unread,
Oct 19, 2023, 11:32:25 PM10/19/23
to newspeak...@googlegroups.com
Gilad,

This response, and in particular

"So the full set of blogs, plus the name/index of the specific post in question would be the start up data."

is definitely clarifying one (most) important question I had about what is meant by the start up data - sounds like it could be the content of the document (maybe other things in other situations).

I actually read in the past you mentioned wanting to build a blog. My goals are to evaluate if I am able to use the IDE for something more real than the simple sample mini-apps I wrote almost 2 years ago, and also see if I am able to grog and use some small subset of Hopscotch - and I thought building a blog would be a good tool for that.

No need to answer my questions in detail; I appreciate the help but realize I get into obsessive details :) and this response progressed my understanding.

Thanks
Milan

PS: I will look at Telescreen code , it compiled but failed on [run] for me.
PPS: I also have a future question regarding the files serving the PWA on https://newspeaklanguage.org/webIDE/. I was able to build a localhost PWA for the HopscotchWebIDE.vfuel, but with a flat structure completely different from https://github.com/newspeaklanguage/newspeak/tree/master/platforms/webIDE . But I do not need it for a while, so I will defer that.



Gilad Bracha

unread,
Dec 17, 2023, 9:30:30 PM12/17/23
to Newspeak Programming Language
An update.  The latest version uses the zip-file based format for documents. This is till a work in progress. While all documents are saved to/loaded from zip files, we still don't preserve all document dependencies. The only dependencies we preserve so far are images. You can drag and drop images into a document, and when you save they will be kept in the resulting zip file.  If you're document does not use any other resources (other media types,  transcluded documents, classes) then you can actually unzip the saved document and find an HTML file that you can use directly - and it will be able access the images in the zip file.  More interesting documents that use Newspeak will always need to be run via a more Newspeak engine using something like  AmpleforthViewer or the IDE. I expect that such live documents will be distributed as PWAs, just like the IDE.
Next steps:

a. Getting all this (ie., drag and drop, saving to/loading from zip files  which work stand alone) to work for audio and video as well.  
b. Saving classes, documents or icons from the IDE namespace that the document references, so that the document is fully self contained.

I'm hoping we'll have (a) and (b) by 2024.

Of course, there's much more that should be done, but this is a significant step forward. 

Milan Zimmermann

unread,
Dec 18, 2023, 3:27:35 AM12/18/23
to newspeak...@googlegroups.com
Thanks for the update Gilad.  I am reading this and looking at the commit (way too quickly).  

Is there a simple workflow you could describe to play with those changes?  Maybe it's still too early though, but let me describe what I tried:

I created a document with a heading (Root + -> Add document)

Then I was able to drag an image into the document - I mean the image showed up. But then I did not know how to 'save' the document or what to do with it; either way the image disappears after some actions I am not sure I can describe atm.

Maybe it's too early but more likely I am misunderstanding something :) as I definitely need to give documents more time


Thanks
Milan


--
You received this message because you are subscribed to the Google Groups "Newspeak Programming Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to newspeaklangua...@googlegroups.com.

Gilad Bracha

unread,
Dec 18, 2023, 10:54:30 AM12/18/23
to newspeak...@googlegroups.com
If you're running the latest release, then you can save the document using the download icon:
 downloadImage.png
which you can find in the document control bar, as in the screenshot below:

Screenshot 2023-12-18 at 07.38.52.png

Remember, that bar is toggleable, so initially you will see a view where it is collapsed:

Screenshot 2023-12-18 at 07.41.59.png

You can expand it by clicking on the toggle button just below where it says [Document AnDocument] (where AnDocument is the name of the current Document).

This will save a zip file as described in my prior message. 

As for the image disappearing, that was a bug in an earlier putback; I suspect you are running an older version. To make sure, check to see if you have a method named #processImagesFromDom:usingFolder: in the system (it will be in class DocumentSubject).



--
Cheers, Gilad

Milan Zimmermann

unread,
Dec 18, 2023, 9:27:10 PM12/18/23
to newspeak...@googlegroups.com
Thanks Gilad for the clarification on how to save a document and the current status.

My problem was that I was using the installed PWA from newspeaklanguage.org, and I assumed it was updated.  Did newspeaklanguage.org not deploy the new version? (Or the PWA on my system does not auto-update as I assumed - I do not know which of these is the problem.) Also, indeed this newspeaklanguag.org based PWA does not have the method processImagesFromDom:usingFolder:

Once I pulled the latest Newspeak from Githug, and built my localhost server from it, things work as advertised, I can download the document as zip, it contains the image, etc (also the method processImagesFromDom:usingFolder: is present).

One related minor note: When I create a document, save and reload, the document class is gone (even the top level document class) - I assume this is part of your b?
     b. Saving classes, documents or icons from the IDE namespace that the document references, so that the document is fully self contained.

Now I need to dig into Documents.ns more 

Thanks,
Milan



Gilad Bracha

unread,
Dec 18, 2023, 11:06:19 PM12/18/23
to newspeak...@googlegroups.com
Hi Milan,

The Newspeak site was updated, so I'm not sure what went wrong in your case. Perhaps the browser cached the old one. The business of PWA update sometimes confuses me as well.  You need to explicitly click the PWA update icon (shown below)  that shows up at the top right of web browser search bar (at least in Chrome; in other browsers it might be a bit different, or not show up at all) to update the installed PWA. And the one run from localhost is actually different; for localhost I found it's often best to uninstall it and reinstall it, which is annoying.
Screenshot 2023-12-18 at 19.51.35.png
As for classes: the document's own class should always get loaded with the document. If this is not working for you, try and create a reproducible scenario and file an issue. If the document relies on classes (or other documents) that aren't in the IDE namespace, they will be missing and things will go awry. For now, you must load such dependencies manually. As I said, I expect to fix that pretty soon.



--
Cheers, Gilad

Milan Zimmermann

unread,
Dec 19, 2023, 3:15:56 AM12/19/23
to newspeak...@googlegroups.com
Gilad, thanks for describing that you actually updated the files serving https://newspeaklanguage.org/webIDE/. A few notes inline


On Mon, Dec 18, 2023 at 11:06 PM Gilad Bracha <gi...@bracha.org> wrote:
Hi Milan,

The Newspeak site was updated, so I'm not sure what went wrong in your case. Perhaps the browser cached the old one. The business of PWA update sometimes confuses me as well. 

Yeah I know.
 
You need to explicitly click the PWA update icon (shown below)  that shows up at the top right of web browser search bar (at least in Chrome; in other browsers it might be a bit different, or not show up at all) to update the installed PWA.

If I understand correctly what you mean: I did click the PWA update icon for  newspeaklanguage.org

My experience so far is the other way around: local updates on changes, the remote newspeaklanguage.org does not. I am using the presence or absence of the method processImagesFromDom:usingFolder: to find if the PWA OR the browser version did or did not update. This is what I see

1) Both non-local


2) Both local


I tried to find any notable reason for the local / non-local behavior differences, but could not so far.  

If I can reproduce it well, and find a way to describe it well, I will report it as a bug on updates not reflecting to clients (updates not reflecting both for PWA users and for users of the application running from the web on  https://newspeaklanguage.org/webIDE/ ).
 
And the one run from localhost is actually different; for localhost I found it's often best to uninstall it and reinstall it, which is annoying.
Screenshot 2023-12-18 at 19.51.35.png
As for classes: the document's own class should always get loaded with the document. If this is not working for you, try and create a reproducible scenario and file an issue.

Maybe I did not explain it well. The top-level document class that I created "does not get saved in the vfuel". To describe "does not get saved in the vfuel" more explicitly: If I create a document, save, exit and restart , the top-level document class is gone - maybe that is expected though. If I create the document, download the document zip, exit, restart, then "Load Document(s)" from the downloaded zip, the document is restored correctly (with the top level class, html, and image) as expected. 

So I think there is no bug based on re-reading your response.

Thanks,
Milan


Gilad Bracha

unread,
Dec 20, 2023, 10:14:40 PM12/20/23
to Newspeak Programming Language
Hi Milan,

Yes, the vfuel does not track documents. This is sort of on purpose, until documents are more solid. They are progressing - the latest version will track classes and transcluded docs - except if the code is literally in the HTML. That's one more step I need to deal with. Also need to refactor and clean up a bit. There's also a buglet where the UI doesn't immediately show the transcluded docs in the namespace. Hayley has also contributed some improvements to the actual editing/DOM manipulation.

As for the odd PWA behavior - I expect it has something to do with the caching settings in PWA manifest, but I really don't know.  At some point We'll figure it out.

Milan Zimmermann

unread,
Dec 21, 2023, 1:51:42 AM12/21/23
to newspeak...@googlegroups.com
Hi Gilad,

Sounds good on the vfuel not yet tracking documents, thanks for the update.

On the PWA potentially not updating (at least I experience it that way): I am reading the documentation, and it appears that a PWA local update must be forced by the server changing some bits in the service worker (sw.js). For example, https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Tutorials/CycleTracker/Service_workers : "Once the PWA is installed on the user's machine, the only way to inform the browser that there are updated files to be retrieved is for there to be a change in the service worker." . They usually use a version variable in the service worker, which provides both the changed bits, and is also used to name and delete the old cache after the update.  I am trying to put together an alternative version of the sw.js / index.html that would do that, but this is new to me, so I am not sure if I get it to work - so far it has some issues. I will update on any progress (or non-progress) here.

Later,
Milan


Gilad Bracha

unread,
Dec 21, 2023, 12:15:16 PM12/21/23
to newspeak...@googlegroups.com
Hi Milan,

Thanks for looking into this. I hope you can figure it out and we can adopt that solution. Most of the PWA work was done by Mircea, but it had some issues and then Mircea had moved to other things. I eventually got round to making it work, but did not really get deeply into the topic; obviously there are still glitches.



--
Cheers, Gilad

Milan Zimmermann

unread,
Dec 21, 2023, 2:01:33 PM12/21/23
to newspeak...@googlegroups.com
Gilad, after  more experiments and research last night, I think I actually understand it much better, and getting somewhere, although I should not speak too early. The process would include (apart from index.html and sw.js changes) a change in the build.sh process that would change the PWA version number in sw.js on each rebuild - hope that makes sense.  I will need to find a Mac and Windows to confirm, as there is something specific to Mac - although I am not touching that part at all.  But again, I will try not to speculate on future potential successes :)

Will follow up on it,

Milan

Reply all
Reply to author
Forward
0 new messages