Prototype and its purpose

66 katselukertaa
Siirry ensimmäiseen lukemattomaan viestiin

vitalije

lukematon,
11.6.2018 klo 11.45.2511.6.2018
vastaanottaja leo-editor
As I have written in this post, my primary goals were not speed related. As a matter of fact I expected that new model would be in some areas even slower than Leo. I did hope however, that  overall performance would be much better than present Leo. But I've never dreamed that it would be so much faster. The speed came as a by-product, as a gift. 

In another thread Edward wrote that the prototype served its purpose, but the way I see it, its main purpose is totally missed. Right now, I don't feel much desire to continue developing the prototype. Somehow I feel it won't be accepted and it is useless to try. However, I am certain that it won't be too long before I continue some experiments in this direction.

The first experiments concerning the new data model I have made in July 2017, right after I wrote functions for reading and writing external files. All these experiments can be found in leo-snippets repository. 

Later I have made an experiment using ClojureScript and electron to make a Leo using electron as GUI. Even later I have made the same experiment but this time using CoffeeScript. After that I have made some experiments using rust language to write a python extension that reads external files. 

So, I have written new data model several times, using several different languages and each time I wrote it I learned something new. Each model was better than the one before. 

I would like to share what I have learned so far. I am even considering an idea to write a book about it. Or, more likely I will write about it in my blog. Writing a book seems too big project at the moment, but who knows. I feel that this new data model has great potential and that it can be used in many other fields as well. I feel that it would be a great idea to rewrite it once again - this time completely in rust language and make it available as both the Python extension and Node extension. I have even installed a Windows 10 as a second OS on my new machine so that I can compile both for Windows and Linux. As the prototype shows new model is quite easily attached to any GUI. 

Edward seems to be not very fond of this new model. It may seem odd or complicated at first, but it is not.

More than once in this forum it was pointed out that Leo's outline is like a database. Let us accept this analogy for now and let me use some database terminology.

At the moment Leo keeps its data in the form of interconnected VNode instances. The result of reading outline is an array of VNodes which represent the top-level nodes in the outline. Their parent node is hiddenRootNode. To change the outline it is quite easy, just insert/append or remove VNode instance from the array of v.children or v.parents. That is all that you have to do in order to change the outline. That is very efficient operation that can't be any faster.

In the database world all data is kept in tables. Appending to the table is quite fast. Inserting and deleting from the table is also very fast operation. However, retrieving the data from database table can be sometimes slow. To speed up queries one usually needs to add some index to the database. Imagine searching for the owner of the specific phone number in a phone book, or for a specific word in a book. To find all occurrences one must read whole book. But if the book has word index at the end, it is very easy to look for a specific word. Also looking for a someones phone number in a phone book is easy if you know first and last name of the person. Adding suitable index to the database makes some queries that can use this index quite faster. On the other hand, every index added to database, slows down all inserts and deletes. The job of database designer is to make an optimal set of indexes so that all critical queries can run fast and that inserts and deletes are not too slow. Sometimes it is faster to drop the index, then perform bulk of insertions and/or deletions and then rebuild the index again.

Currently Leo is like a database with no indexes at all. We can modify the outline very fast, but when we need to query tree we have no other choice but to go from the start to the end. That makes drawing slow and very dependent on the outline size and number of expanded nodes. To speed drawing we have to add some indexes to Leo. My new data-model is nothing more than a database with just enough indexes to make all outline operations fast enough and traversing, searching, drawing very fast.

Edward can come up with another set of indexes, to achieve same performance as the prototype demonstrates. Perhaps, he can find even better suited set of indexes so that Leo becomes even faster than my prototype. The one thing I am sure is that speed can't be improved without some kind of indexes. And those indexes whatever they may be like, need to be updated every time outline changes. Currently Leo's data is totally unprotected. There is no way to find out about a change in the outline unless it is explicitly reported. And many scripts do not report change. The one possible solution is to replace both children and parents fields with user defined list instances that override all methods that modify list. Overridden methods should update both the list and indexes to keep them in sync. However, there may exist some scripts that set directly parents and children fields of vnode instances. Those scripts would broke Leo. In that case it would be necessary to turn fields 'children' and 'parents' into properties, to protect Leo's data from this kind of scripts.

There is a better solution. Let's use internally new data-model instead of vodes and positions. All of the operations that modify tree are already implemented in the prototype and all of them run at least ten times faster than current Leo commands. Undo/redo code can be substantially simpler and shorter, drawing can be very fast and the only thing that remain to be done is making scripts work as before. 

The easiest thing to do is to change execute-script command so that it first sync data from new model to c.hiddenRootNode, then execute script, and after executing script, resync data from c.hiddenRootNode back to new model. With this approach all scripts will continue to work just as they used to, and Leo's internals can be greatly simplified. 

I intend to write more about new model and to make some pictures to make it more understandable. That would also help me when (and if) I start rewriting it in Rust.

Vitalije

Edward K. Ream

lukematon,
11.6.2018 klo 12.48.4511.6.2018
vastaanottaja leo-editor
On Mon, Jun 11, 2018 at 10:45 AM, vitalije <vita...@gmail.com> wrote:

​> ​
I have written new data model several times, using several different languages and each time I wrote it I learned something new. Each model was better than the one before.
​​
​> ​
I would like to share what I have learned so far.

​Please do.
 I'm not sure why you think my reaction might be negative.  Afaik, all I have said is that I would like Leo's primary scripting interface to remain (largely) unchanged.


But compatibility is not a requirement for prototypes.  They need only demonstrate that something is feasible.  A new data model would be extremely interesting in its own right.

​> I feel that this new data model has great potential and that it can be used in many other fields as well.


I am interested in anything you do, Vitalije.

> I feel that it would be a great idea to rewrite it once again - this time completely in rust language and make it available as both the Python extension and Node extension. I have even installed a Windows 10 as a second OS on my new machine so that I can compile both for Windows and Linux. As the prototype shows new model is quite easily attached to any GUI.

Cool!

​> ​
Edward seems to be not very fond of this new model. It may seem odd or complicated at first, but it is not.

​Really, I have no idea why you say this. I have no opinion of the new model.  In fact, I didn't understand it at all, until I read this letter.  I am completely open to learning about it. ​

​> ​
More than once in this forum it was pointed out that Leo's outline is like a database. Let us accept this analogy for now and let me use some database terminology.

​Sounds perfectly reasonable to me.​

​> ​
Currently Leo is like a database with no indexes at all.

​Very interesting change in point of view.


​> ​
We can modify the outline very fast, but when we need to query tree we have no other choice but to go from the start to the end. That makes drawing slow and very dependent on the outline size and number of expanded nodes. To speed drawing we have to add some indexes to Leo.

​Good summary.


> ​
My new data-model is nothing more than a database with just enough indexes to make all outline operations fast enough and traversing, searching, drawing very fast.

​This is great stuff.  Very clear. I never understood what you proposed ​doing until now.

I'm all for exploring this. 

​> ​
Edward can come up with another set of indexes, to achieve same performance as the prototype demonstrates.
​ ​
Perhaps, he can find even better suited set of indexes so that Leo becomes even faster than my prototype. The one thing I am sure is that speed can't be improved without some kind of indexes.

​Are you talking about screen redraws? That might be made faster by drawing only the visible part of the screen. Anyway, the data model is interesting on its own.  ​

​> ​
And those indexes whatever they may be like, need to be updated every time outline changes.

​Yes.


> ​
Currently Leo's data is totally unprotected. There is no way to find out about a change in the outline unless it is explicitly reported.
And many scripts do not report change. ​

​Another clear concept, and a big change in point of view.

> ​
​The one possible solution is to replace both children and parents fields with user defined list instances that override all methods that modify list. Overridden methods should update both the list and indexes to keep them in sync. However, there may exist some scripts that set directly parents and children fields of vnode instances. 


Imo, we don't necessarily have to maintain compatibility with such low-level scripts.  Certainly we can imagine excluding those scripts while discussing your ideas.

> Those scripts would broke Leo. In that case it would be necessary to turn fields 'children' and 'parents' into properties, to protect Leo's data from this kind of scripts.


Let's not worry about such things now.

​> ​
There is a better solution. Let's use internally new data-model instead of vodes and positions. All of the operations that modify tree are already implemented in the prototype and all of them run at least ten times faster than current Leo commands. Undo/redo code can be substantially simpler and shorter, drawing can be very fast and the only thing that remain to be done is making scripts work as before.

​Sounds like an excellent plan.​

​> ​
The easiest thing to do is to change execute-script command so that it first sync data from new model to c.hiddenRootNode, then execute script, and after executing script, resync data from c.hiddenRootNode back to new model. With this approach all scripts will continue to work just as they used to, and Leo's internals can be greatly simplified.

​Very clever.  I would never have thought of doing this.
​> ​
I intend to write more about new model and to make some pictures to make it more understandable. That would also help me when (and if) I start rewriting it in Rust.

​Excellent!​

​Summary​

This is outstanding work.  I am eager to study it further.

This letter is like a Rosetta stone.  You have clearly summarized why a new data model might be useful. Perhaps now we can understand your prototype code.  Please tell us where we can find the latest version.

Edward

vitalije

lukematon,
12.6.2018 klo 6.30.5212.6.2018
vastaanottaja leo-e...@googlegroups.com
 Please tell us where we can find the latest version.

Edward

I am glad that you find new data model still interesting and that you will give it a chance. Latest version is not very far from the version I have sent to you before. As I remember I have just fixed one small issue since. It caused tests to fail when gnx in the outline is not the same as the gnx in external file. That was a two lines fix. Since then I haven't worked on the model. There are lot of small things that I have put on hold until after this weekend, that now require my attention. As soon as I can work again on model, I plan to work on the undo/redo functionality next. It may cause some changes in the model. That is why I think that documenting and explaining model have to wait a bit longer. But as soon as I am satisfied with the shape of model, I'll do my best to explain it in full detail. Then we can discuss it further.

Are you talking about screen redraws? That might be made faster by drawing only the visible part of the screen. 

Yes, I had mostly drawing in mind. Drawing tree in Leo perhaps can be made faster using the same algorithm as in my prototype and using positions for traversing movements. We can say perhaps that position is a temporary index in database. Your comment made me think about this possibility a lot. I can't say exactly why I doubt that it would be very effective or point where potential problem will arise. Here are just a few thoughts:
  • it is not enough to have a sequence of visible items. We also need to link visible items with the positions. In the prototype this is achieved by using the fact that positions are immutable which is not true for p instances
  • traversing tree using p instances involves list manipulations and instantiations which are more expensive than integer arithmetic that is used in the prototype
  • in new model it is well defined where is the next visible node relative to the current one. In case of v instances it is hard if not impossible to figure out what will be the next visible v node
Anyway, these are only my guesses. Maybe some experimenting will show more precisely what is missing and whether it can be easily added to the p and v classes.

I remember that I have tried few months ago in 768-new-tree branch to make some improvements in tree drawing, but with no real success. At that time I have already implemented drawing tree in leo-el-vue and leo-cljs using very similar algorithms as the one in the prototype. Yet, I haven't found the way to apply the same algorithm using p and v. There may be a way, I am not sure.


Vitalije

rengel

lukematon,
12.6.2018 klo 7.58.1612.6.2018
vastaanottaja leo-editor
...
 
More than once in this forum it was pointed out that Leo's outline is like a database. Let us accept this analogy for now and let me use some database terminology.

At the moment Leo keeps its data in the form of interconnected VNode instances. The result of reading outline is an array of VNodes which represent the top-level nodes in the outline. Their parent node is hiddenRootNode. To change the outline it is quite easy, just insert/append or remove VNode instance from the array of v.children or v.parents. That is all that you have to do in order to change the outline. That is very efficient operation that can't be any faster.

In the database world all data is kept in tables. Appending to the table is quite fast. Inserting and deleting from the table is also very fast operation. However, retrieving the data from database table can be sometimes slow. To speed up queries one usually needs to add some index to the database. Imagine searching for the owner of the specific phone number in a phone book, or for a specific word in a book. To find all occurrences one must read whole book. But if the book has word index at the end, it is very easy to look for a specific word. Also looking for a someones phone number in a phone book is easy if you know first and last name of the person. Adding suitable index to the database makes some queries that can use this index quite faster. On the other hand, every index added to database, slows down all inserts and deletes. The job of database designer is to make an optimal set of indexes so that all critical queries can

Just in case you didn't know I would like to point out that there are other database alternatives, among others one that I only discovered some weeks ago (there is so much going on in the software world). It would be pretty easy to implement a DAG in a graph database:


And the very readable tutorial:


You might want to have a look before moving forward.

Reinhard
Vastaa kaikille
Vastaa kirjoittajalle
Välitä
0 uutta viestiä