#@ at.findFilesToRead :ekr.20190108054317.1:6#@+node:ekr.20190108054317.1: *6* at.findFilesToRead$ python p.py
setting leoID from os.getenv('USER'): 'vitalije'
f_new average: 30.429ms
f_old average: 58.055msFor the past few days I've been working on the reusable functions for both parsing content of external files and writing external files. In the attached Leo document there are two new scripts. One is for generating the test data, and the other is for testing these two new functions. All tests are passing and round trip (text-> outline -> text) confirms that these functions have almost the same effect as Leo's FastAtFile reading and atFile writing methods.
Thinking about the format of external files and looking at them, I've come to the conclusion that this format contains some redundant information. This is not a big problem, but since I am currently working on this part of the Leo's code base, I wish to propose some improvements to this format. Having redundant information means that different files may produce the same outline. This can cause problems when testing round trip transformations.
First of all I have to say, that I wrote two simple scripts that can automatically convert current external file content to the new format and back to the original format.
Also so called "dangerous directives" (@comment and @delims), are never used in the Leo's code base. Personaly I can't think of the use case for those directives. If anyone knows for a specific use case where these directives can solve a real life problem which can't be solved without these directives, please share it here. I wish to understand why would anyone wish to use these directives. If no such use case can be found, I would strongly suggest dropping support for those dangerous directives. It would allow us to further simplify both reading and writing code.
[snip]
Less sentinel lines means less parsing less ambiguity and less work which leads to both simpler code and faster execution.Your thoughts, please.
I just used @delims the other day for a Windows command file. In cmd files I use "::" as a comment marker. I didn't find a Leo file type for cmd files, so I just went ahead and used the directive.
This has bothered me five or ten times when for unusual reasons I wanted
to @file one external file from two Leo-Editor files. In most cases
this problem caused me to do something else. In one or two cases I
lived with this problem.
--
Segundo Bob
Segun...@gmail.com
For the past few days I've been working on the reusable functions for both parsing content of external files and writing external files. In the attached Leo document there are two new scripts. One is for generating the test data, and the other is for testing these two new functions. All tests are passing and round trip (text-> outline -> text) confirms that these functions have almost the same effect as Leo's FastAtFile reading and atFile writing methods.
Thinking about the format of external files and looking at them, I've come to the conclusion that this format contains some redundant information. This is not a big problem, but since I am currently working on this part of the Leo's code base, I wish to propose some improvements to this format. Having redundant information means that different files may produce the same outline. This can cause problems when testing round trip transformations.
top level node gnx and its headline are not necessary. Both headline and gnx are present in the xml. They don't provide any useful information. This also can cause problems when two different outlines contain the same external file. If the top level node have different path or different gnx in those outlines than they would produce different file even if they have the same content.
- @+<< sentinels are redundant too. When we encounter the node whose headline is a section reference, we know that the section reference was just before the opening node line.
- @-<< sentinel and @afterref can be joined in one. The section name is not necessary because opening and closing sections must be properly nested. We know for sure that the closing section has the same headline as the last open one. The closing @-<< sentinel can give a clue whether the following line is @afterref or an ordinary line. For example @-<<[ means same as closing section sentinel followed by an @after line, while @-<<] means there is no @after line after this closing sentinel.
- @+others is not necessary because when we hit the first open node without the section reference in its headline we know for sure that just before this node was @others directive. Also when we encounter new open node with the different identation we can be sure that just before this node was @others directive. In the reading external file this line is used just to push current node data on the stack. But this signal can be added to the opening node sentinel as a single character.
- format of @+node sentinel can be changed so that headline comes first and gnx and level at the end of the line for example:
instead of#@ at.findFilesToRead :ekr.20190108054317.1:6It would be nicer to read source code using other editors#@+node:ekr.20190108054317.1: *6* at.findFilesToRead
- closing @-leo line is not necessary and there is no need for @last directives either. Last lines are just last lines of the top level node.
- @first directive can be present in the body, but it doesn't need to be written in the external file, because we know that all lines coming before `@+leo` sentinel are first lines.
Also so called "dangerous directives" (@comment and @delims), are never used in the Leo's code base. Personaly I can't think of the use case for those directives.
6. Changing Leo's file format to make your new code easier to test would be letting the tail wag the dog. I am confident that you can find a robust testing strategy that does not depend on a new file format.
6. Changing Leo's file format to make your new code easier to test would be letting the tail wag the dog. I am confident that you can find a robust testing strategy that does not depend on a new file format.
I understand your unease for making this kind of change. There is nothing urgent in my proposition. If we change write code so that it outputs starting sentinel @+leo-ver=6, we can use two different functions for parsing the rest of the file content. Old files having @+leo-ver=5 will be loaded using the old reading code. So there won't be any inconveniences for users, developers and future maintainers.
Edward also mentioned redundancy. IMO, redundancy that helps in error recovery is good. Remember, there are going to be tens of thousands of files in the new format eventually. Some of them will have mis-used directives, some of them will have some kind of corruption. We need to have a good chance of recovering those files anyway.
You wonder why the speed of reading and writing matters. Perhaps when you use Leo it doesn't matter to you if it will load 200ms faster or not. But If a developer wants to run thousand of tests than 20ms less actually means 20 seconds less. Waiting 20 seconds more for tests to finish, might break developer's thought flow. Keeping developer's thought flow leads to better code. So in the end users will benefit even if they don't care about this micro optimizations.
Anyway, I won't insist on changing the format, but if we are changing something it would be better to make all changes at once.
Regarding the first node start sentinel, perhaps new read code can just skip this sentinel and use the values from the xml for gnx and headline. When writing a file, Leo can check to see if this sentinel is present in the external file and if it is, it will keep this sentinel line unchanged. Leo always reads existing file to check whether there is a change or not, so this check won't be too expensive. This way single external file can be opened using different paths in different outlines without generating unnecessary file changes.