Improved 'extract' command

111 views
Skip to first unread message

vitalije

unread,
Sep 16, 2017, 4:11:58 AM9/16/17
to leo-editor
Revision 0b4223f1d contains improved version of extract command. It is very useful command especially when manually importing/organizing source files. I guess it wasn't used very often for other languages than python. It can now deal with javascript, coffeescript, clojure/clojurescript and it can be easily extended to support any other language. 

Well, while I write this I have just realized that it is easy to extend it for developers not for users. I have hard-coded regex patterns for languages I use. It would be much more usable if those patterns are read from settings.

Default binding for this command is Ctrl+Shift+D. Here is docstring for this command:

Create child node from the selected body text.

  1. If the selection starts with a section reference, the section name become the child's headline. All following lines become the child's body text. The section reference line remains in the original body text.
  2. If the selection looks like a Python class or definition line, the class/function/method name becomes the child's headline and all selected lines become the child's body text.
  3. Otherwise, the first line becomes the child's headline, and all selected lines become the child's body text.

Vitalije

vitalije

unread,
Sep 16, 2017, 4:50:41 AM9/16/17
to leo-editor
Done at 41ebaba.

Added support for `@data extract-patterns`.

This setting should contain in body regex pattern definitions, one per line, which should capture prefered headline in group(1) from the first line of extracted text. For example:

line containing '\s*(?:def|class)\s+(\w+)' (without appostrophe), will match python definitions of functions/methods and classes.

User defined regex patterns are prepended to default list, so even if this setting is empty, Leo will extract definitions in: python, javascript, clojure, coffeescript.

Note: if you need groups in pattern, make them with (?: ....) syntax so that the headline is captured in group(1). In the example above first group that matches 'def' or 'class' keyword is surrounded with '(?:' and ')'  in order to skip this group from the output groups and let the second group to be captured at index 1. 'extract` command relies on this convention that headline should be captured in group at index 1. It is also possible to write several regex patterns and avoid all other groups except the one that captures headline. For example regex from above can be written in two lines like so:
\s*def\s+(\w+)
\s*class\s+(\w+)

Vitalije

Edward K. Ream

unread,
Sep 18, 2017, 11:58:32 AM9/18/17
to leo-editor
On Sat, Sep 16, 2017 at 3:50 AM, vitalije <vita...@gmail.com> wrote:
Done at 41ebaba.

Added support for `@data extract-patterns`.

​Many thanks for this work.  Rev 449d675 revises the docstring for the extract command.  As I write this, I see that the docstring should​ mention the new @data node.  Vitalije, could you do that please?

Edward

Terry Brown

unread,
Oct 6, 2017, 10:22:17 AM10/6/17
to leo-e...@googlegroups.com
On Sat, 16 Sep 2017 01:50:41 -0700 (PDT)
vitalije <vita...@gmail.com> wrote:

> Done at 41ebaba <http://41ebaba5cc720beefa752649007a9c409c33a5b1>.
>
> Added support for `@data extract-patterns`.

This is great. So now I'm wondering how we can extend extract to
handle defs like this:

// set viewer time to time of selected node
goTo = () => {
this.drifterViz.gotoTime(new Date(this.drifterViz.currentNode().gpstime))
}

i.e. the name I want to use for the node, `goTo`, is on the *second*
line. Just glanced and the code and see only the first line of the
selection is passed to the pattern matching....

Looks like (?ms) in a regex will put re into multi-line dot matches all
mode, I guess just (?m) would be sufficient, so I *think* passing the
whole selection to extractDef() and letting people set (?m) in their
patterns would be sufficient and more importantly backwards compatible?

Thoughts?

Cheers -Terry

vitalije

unread,
Oct 6, 2017, 4:48:14 PM10/6/17
to leo-editor
I would not expect multi-line patterns to work (at least without some changes in code). AFAIR function extractDef tries only first line not whole selected region. Perhaps it can be changed to send whole selection to pattern checking. 
I can't check it until some day next week.
Vitalije

vitalije

unread,
Oct 6, 2017, 4:54:55 PM10/6/17
to leo-editor
i.e. the name I want to use for the node, `goTo`, is on the *second*
line.  Just glanced and the code and see only the first line of the
selection is passed to the pattern matching....

Looks like (?ms) in a regex will put re into multi-line dot matches all
mode, I guess just (?m) would be sufficient, so I *think* passing the
whole selection to extractDef() and letting people set (?m) in their
patterns would be sufficient and more importantly backwards compatible?

I am sorry, I haven't read your whole message  (it sims like I am also in mode of taking only first line in account :-) )

Yes, you are right. I was trying to change as little as possible, but I'd love if extractDef would allow multi-line regex patterns.
Vitalije

Terry Brown

unread,
Oct 13, 2017, 1:08:42 PM10/13/17
to leo-e...@googlegroups.com
On Fri, 6 Oct 2017 13:54:55 -0700 (PDT)
vitalije <vita...@gmail.com> wrote:

> I am sorry, I haven't read your whole message  (it sims like I am
> also in mode of taking only first line in account :-) )

Ok, that was very funny :-)

> Yes, you are right. I was trying to change as little as possible, but
> I'd love if extractDef would allow multi-line regex patterns. Vitalije

I've pushed a version
https://github.com/leo-editor/leo-editor/commit/4eb94ac
which feeds the whole selection to the headline seeking patterns.

I can't see how it could break anything, but let me know, anyone, if
the extract command starts acting odd. Ran unit tests ok.

Cheers -Terry

Terry Brown

unread,
Oct 13, 2017, 2:53:10 PM10/13/17
to leo-e...@googlegroups.com
Hmm, some unforeseen challenges.

I have this code:

---cut here---
// Promise wrapper for simple XMLHttpRequest GET
export function getData(url) {
return new Promise(function(resolve, reject) {
var req = new XMLHttpRequest()
req.open('GET', url)
req.onload = function () {
if (req.status == 200) resolve(JSON.parse(req.response))
else reject({code: req.status, message: req.statusText})
}
req.send()
})
}
---cut here---

I want to extract it to a node called 'getData'. But I don't have a
pattern for `export function....`. So it gets put in a node called
`onload` because there is a pattern that matches that.

Well, ok, so let me just use the manual 'first line of selection is
headline' mode, where I enter `getData` in the body text on its own
line ahead of the block and then select from that down to `}`.

Nope, `onload` still takes priority. Ok, I'll add a pattern to force
use of the first line, e.g. a pattern starting with '|', so instead of
entering `getData` in the body text I enter `|getData`. Ok, right
headline, but leaves `|getData` at the top of the body text.

Seems like the simplest thing might be hard coding a mechanism like
the leading `|` to force first line is headline mode. Of course
having a pattern for `export function...` would work too, but there
will always be missing patterns.

Thoughts?

Cheers -Terry

Edward K. Ream

unread,
Oct 18, 2017, 7:32:52 AM10/18/17
to leo-editor
On Mon, Sep 18, 2017 at 10:58 AM, Edward K. Ream <edre...@gmail.com> wrote:

On Sat, Sep 16, 2017 at 3:50 AM, vitalije <vita...@gmail.com> wrote:
Done at 41ebaba.

Added support for `@data extract-patterns`.

​Many thanks for this work.  Rev 449d675 revises the docstring for the extract command.  As I write this, I see that the docstring should​ mention the new @data node. 

​Rev 478a7b6 ​adds the following to the docstring (used by the help-for-command command):

You may add additional regex patterns for definition lines using
@data extract-patterns nodes. Each line of the body text should a
valid regex pattern. Lines starting with # are comment lines. Use \# for patterns starting with #.

This rev also reports errors in the regex's in @data extract-patterns without a traceback. This is debatable.  The traceback pinpoints the error in the regex, but I think a traceback is too verbose and alarming.  It's easy enough to debug regex's at the python regex tester.

Edward

vitalije

unread,
Oct 19, 2017, 3:02:38 AM10/19/17
to leo-editor


On Friday, October 13, 2017 at 8:53:10 PM UTC+2, Terry Brown wrote:
Hmm, some unforeseen challenges.

I have slightly change algorithm for detecting headline in extractDef method. Now it checks the first line of selection and only if there is no match, checks the whole selection. As you have witnessed, in some languages like javascript and coffeescript there are often nested function definitions and it makes difficult to find the right headline. My solution is far from being complete, but I believe it reduces number of false positives.

As I write this, I think the proper solution would be to find all matches and then to choose one with the lowest indentation. Presumably, it would be the most common thing to do, something that most users would expect, I imagine. Am I wrong in this assumption?

Vitalije 

Edward K. Ream

unread,
Oct 19, 2017, 3:59:21 AM10/19/17
to leo-editor
​Hmm. I am uneasy with complex solutions to this problem. They could easily lead to confusion and unintended behavior.

The simplest thing that could possibly work would be to pick the first identifier of the first line of the selection as the headline of the new node, assuming no regex matched. This could be done by adding one last default regex to extractDef, to be matched after all extractDef_patterns patterns and all user patterns.

Finally, the user can always changed the newly-created node by hand, using all of her natural pattern-matching abilities ;-)

What do you think?

Edward

vitalije

unread,
Oct 19, 2017, 5:34:51 AM10/19/17
to leo-editor

In rev:df3d88bb1, I have added one simple method to try each of selected lines in sequence. First line that matches will win. That will cover I believe most cases like nested definitions, python decorators and such. However, it will be broken if the last pattern matches always.


The rev:df3d88bb1 will successfully extract the example code Terry has given.


I hope Edward, you won't consider this solution overly complicated :-)


What remains to be done, is writing some test cases to check and verify expected behavior of this command.

If we bump later on some corner cases those cases should be added to these tests.




Finally, the user can always changed the newly-created node by hand, using all of her natural pattern-matching abilities ;-)

What do you think?

Edward
 It is more trouble than that. If extractDef doesn't find headline, it strips the first line from the selection and uses it as headline. In effect it deletes the first line of selected code and it can be frustrating for the user. If it however, finds a match even a wrong one, it will keep all selected lines intact and the only a headline can possibly be wrong.


I don't know if anyone else was using this command very often. I know that I had written a private plugin whose sole purpose was to patch commander with the new definition of extractDef. It was long time ago before I got commit access to Leo repository, so it was the only way for me to change Leo code.


When importing and analyzing code written in coffeescript, javascript or clojurescript, this command was tremendously helpful (in patched version) and totally useless in its original version. That made me wander if there is anyone else there who uses this command at all. Of course, it was designed to work with python definitions, but surely there are users who use Leo for writing code in other languages as well. My conclusion was that most probably those users haven't been using extract command at all. How else could it be possible that nobody ever complained about its shortcomings before?


This is how I use the extract command when importing foreign code. Usually I open source file with the '@edit` node. It places the whole code in one large body. Then I try to find the largest blocks of code (classes, exports objects,...). Reading from the top of the file, whenever I find the beginning of a block (class definition, or very often comment line that announces a block of correlated classes/functions), I search for the beginning of next such block. When I find the next block (or the end of file if it is the last block), I select all lines from current position up to the beginning of the block and execute extract command (Ctrl+Shift+d). That gives me a few nodes smaller than the original one. In each of them I look for smaller blocks like methods or functions and repeat the process with those blocks. I add '@others' where necessary in parent nodes. And so I repeat this process until I have chunked all code into small enough nodes. The alternative would be to use '@auto' import for a source file, but I was never truly satisfied with the results at least not for coffeescript, javascript and clojurescript files. That is why I find the extract command so useful to me. If you have never heard of extract command or you have tried it before and weren't satisfied with what it did, now is the time to give it a try.


Vitalije

Edward K. Ream

unread,
Oct 19, 2017, 10:37:23 AM10/19/17
to leo-editor
On Thu, Oct 19, 2017 at 4:34 AM, vitalije <vita...@gmail.com> wrote:

In rev:df3d88bb1, I have added one simple method to try each of selected lines in sequence. First line that matches will win. That will cover I believe most cases like nested definitions, python decorators and such. However, it will be broken if the last pattern matches always.

The rev:df3d88bb1 will successfully extract the example code Terry has given.

​The code looks good to me.

Finally, the user can always changed the newly-created node by hand, using all of her natural pattern-matching abilities ;-)
​...​
 It is more trouble than that. If extractDef doesn't find headline, it strips the first line from the selection and uses it as headline. In effect it deletes the first line of selected code and it can be frustrating for the user. If it however, finds a match even a wrong one, it will keep all selected lines intact and the only a headline can possibly be wrong.

​Ok.  I had forgotten that.​

I don't know if anyone else was using this command very often. I know that I had written a private plugin whose sole purpose was to patch commander with the new definition of extractDef. It was long time ago before I got commit access to Leo repository, so it was the only way for me to change Leo code.

​...​
When importing and analyzing code written in coffeescript, javascript or clojurescript, this command was tremendously helpful (in patched version) and totally useless in its original version. That made me wander if there is anyone else there who uses this command at all.
 
​I used to use this command.  Now I typically use parse-body.​

Of course, it was designed to work with python definitions, but surely there are users who use Leo for writing code in other languages as well. My conclusion was that most probably those users haven't been using extract command at all. How else could it be possible that nobody ever complained about its shortcomings before?

​I can't answer that.  We have to rely on complaints, including our own.

This is how I use the extract command when importing foreign code. Usually I open source file with the '@edit` node. It places the whole code in one large body. Then I try to find the largest blocks of code (classes, exports objects,...). Reading from the top of the file, whenever I find the beginning of a block (class definition, or very often comment line that announces a block of correlated classes/functions), I search for the beginning of next such block. When I find the next block (or the end of file if it is the last block), I select all lines from current position up to the beginning of the block and execute extract command (Ctrl+Shift+d).

​Have you tried parse-body?  It uses Leo's importer code for the @language in effect, if one exists. See the node "ic.parse_body & helper".  This should save you lots of work for the languages for which importers exist, including coffeescript and javascript. I don't see an importer for clojurescript, but it probably would not be too difficult to create it.

That gives me a few nodes smaller than the original one. In each of them I look for smaller blocks like methods or functions and repeat the process with those blocks. I add '@others' where necessary in parent nodes. And so I repeat this process until I have chunked all code into small enough nodes. The alternative would be to use '@auto' import for a source file, but I was never truly satisfied with the results at least not for coffeescript, javascript and clojurescript files. That is why I find the extract command so useful to me. If you have never heard of extract command or you have tried it before and weren't satisfied with what it did, now is the time to give it a try.

​You can always rearrange nodes after doing parse-body, which you can't do when using @auto, so parse-body plus manual tweaks is likely to save you lots of work.

Edward

vitalije

unread,
Oct 19, 2017, 11:47:38 AM10/19/17
to leo-editor

​Have you tried parse-body?  It uses Leo's importer code for the @language in effect, if one exists. See the node "ic.parse_body & helper".  This should save you lots of work for the 

I wasn't aware of this command. I will definitely try it. It is just another example how many cool things there are in Leo which even after 10+ years of using Leo one can easily miss.  

Having said that, one editor feature came in to my mind. One of the editors that I used before (not sure which one), had this feature: to show tips on usage on start-up. Every time it starts it would greet a user with a window with something like: "Did you know that you can use ... to achieve ...." along with link to the page in documentation explaining further this feature. I remember discovering some very useful features that way. I suppose it wouldn't be too complicated to make such feature in Leo, but those short tips are to be written and collected first.

Vitalije

Edward K. Ream

unread,
Oct 19, 2017, 12:09:14 PM10/19/17
to leo-editor
On Thu, Oct 19, 2017 at 10:47 AM, vitalije <vita...@gmail.com> wrote:

​Have you tried parse-body?  It uses Leo's importer code for the @language in effect, if one exists. See the node "ic.parse_body & helper".  This should save you lots of work for the 

I wasn't aware of this command. I will definitely try it. It is just another example how many cool things there are in Leo which even after 10+ years of using Leo one can easily miss.  

​Another way to the same thing is to use @auto the first time and then convert @auto to @file.​

Having said that, one editor feature came in to my mind. One of the editors that I used before (not sure which one), had this feature: to show tips on usage on start-up. Every time it starts it would greet a user with a window with something like: "Did you know that you can use ... to achieve ...." along with link to the page in documentation explaining further this feature. I remember discovering some very useful features that way. I suppose it wouldn't be too complicated to make such feature in Leo, but those short tips are to be written and collected first.

​Good idea.  The first items on this list would be bookmarks.py and cff. Then maybe tab completion in the minibuffer.

Any other candidates?

Edward

Chris George

unread,
Oct 19, 2017, 12:22:21 PM10/19/17
to leo-e...@googlegroups.com
Abbreviations, including the ability to collect data via scripts.

Chris

--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscribe@googlegroups.com.
To post to this group, send email to leo-e...@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Kent Tenney

unread,
Oct 19, 2017, 1:11:45 PM10/19/17
to leo-editor
one for each @file @auto @clean ...
one for @path
one for @button
one for @menu
one for <ctl-b>
...
...
one which says "so many features even Edward can't keep up"

--

Edward K. Ream

unread,
Oct 20, 2017, 2:48:00 AM10/20/17
to leo-editor

On Thu, Oct 19, 2017 at 9:37 AM, Edward K. Ream <edre...@gmail.com> wrote:

​I don't see an importer for clojurescript, but it probably would not be too difficult to create it.

​Presumably the clojure importer would be a slight modification of plugins/importers/elisp.py.​

Edward

Reply all
Reply to author
Forward
0 new messages