Cleaning imported javascript headlines

21 views
Skip to first unread message

Edward K. Ream

unread,
Feb 23, 2018, 10:42:24 AM2/23/18
to leo-e...@googlegroups.com
Leo's import system has many strengths.  The most important is that code will almost always import correctly even if the resulting nodes aren't even close to optimal.  This is particularly important for javascript, where coding styles vary so widely.

After generating nodes, the import system calls the clean_headline method.  The base Importer class defines clean_headline as follows:

def clean_headline(self, s, p=None):
   
'''
    Return the cleaned version headline s.
    Will typically be overridden in subclasses.
    '''

   
return s.strip()

The javascript importer now defines this method this way:

clean_regex_list1 = [
    re
.compile(r'\s*\(?(function\b\s*[\w]*)\s*\('),
    re
.compile(r'\s*([\w]+\:\s*\(*\s*function\s*\()'),
    re
.compile(r'\s*(?:const|let|var)\s*(\w+\s*(?:=\s*.*)=>)'),
]
clean_regex_list2
= [
    re
.compile(r'(.*)\((\s*function)'),
    re
.compile(r'(.*\=)(\s*function)'),
]

def clean_headline(self, s, p=None):
   
'''Return a cleaned up headline s.'''
    s
= s.strip()
   
# Don't clean a headline twice.
   
if s.endswith('>>') and s.startswith('<<'):
       
return s
   
for ch in '{(=':
       
if s.endswith(ch):
            s
= s[:-1].strip()
   
# First regex cleanup.
   
for pattern in self.clean_regex_list1:
        m
= pattern.match(s)
       
if m:
            s
= m.group(1)
           
break
   
# Second regex cleanup.
   
for pattern in self.clean_regex_list2:
        m
= pattern.match(s)
       
if m:
            s
= m.group(1) + m.group(2)
           
break
    s
= s.replace('  ', ' ')
   
return g.truncate(s, 100)

This isn't perfect, but is much better than previous versions.  I encourage Vitalije or anyone else to suggest improvements.

Edward

vitalije

unread,
Feb 23, 2018, 11:22:55 AM2/23/18
to leo-editor
Cleaning generated headlines is just a part of the problem. More serious problem is choosing what should go in to single node. As far as I am concerned, I would much prefer to have a toolbox with several commands to help me importing files on my own. I wrote about that idea here. I don't know when (and if) I would have time to write such a plugin. Solving this serious problem (* importing external source files) in fully automatic but satisfactory way IMHO must involve writing a proper lexer/parser functions at least for .js .jsx, .ts .tsx files. That seems to be a lot of work, so I believe more likely is that I would pursue this simpler idea - a plugin with handful of universal functions for manipulating source code, that user can use to reshape code any way he/she likes. That would work for all kinds of source files.
Vitalije

Terry Brown

unread,
Feb 23, 2018, 11:31:35 AM2/23/18
to leo-e...@googlegroups.com
On Fri, 23 Feb 2018 08:22:55 -0800 (PST)
vitalije <vita...@gmail.com> wrote:

> a plugin with handful of universal functions for manipulating source
> code, that user can use to reshape code any way he/she likes. That
> would work for all kinds of source files.

It would be fun, although perhaps not ultimately useful ;-) to
experiment with a function in the above toolkit that tries generate
regexs bases on selected text. So you select a unit of meaning you
want in a node in your source and run this function, and it cycles
through a few heuristics to generate regexes and reports how many more
similar blocks would be recognized by each one, with maybe a preview of
what they look like.

You could even have a preparatory command where you highlight the text
you'd want to be the node name, run the preparatory command to store
that name, then run the main function, so now at least it knows where
the name is. Not everything has a name in that way of course.

Cheers -Terry

Edward K. Ream

unread,
Feb 23, 2018, 11:32:59 AM2/23/18
to leo-editor
On Friday, February 23, 2018 at 9:42:24 AM UTC-6, Edward K. Ream wrote:

> The javascript importer now defines [clean_headline] this way:

I am working on improvements.  Please hold suggestions for a bit.

Edward

Edward K. Ream

unread,
Feb 23, 2018, 12:27:51 PM2/23/18
to leo-editor
On Friday, February 23, 2018 at 10:32:59 AM UTC-6, Edward K. Ream wrote:

I am working on improvements [to .  Please hold suggestions for a bit.

Rev 0deba52 completes the present round of work on this project. There are no problems that I can see with the headlines when importing leovue/src.  There are problems assigning lines to nodes.

Edward

vitalije

unread,
Feb 23, 2018, 12:34:15 PM2/23/18
to leo-editor


  There are problems assigning lines to nodes.

Edward
Precisely  

Edward K. Ream

unread,
Feb 23, 2018, 12:35:42 PM2/23/18
to leo-editor
On Friday, February 23, 2018 at 10:22:55 AM UTC-6, vitalije wrote:

Cleaning generated headlines is just a part of the problem. More serious problem is choosing what should go in to single node.

I agree completely.
 
As far as I am concerned, I would much prefer to have a toolbox with several commands to help me importing files on my own. I wrote about that idea here. I don't know when (and if) I would have time to write such a plugin. Solving this serious problem (* importing external source files) in fully automatic but satisfactory way IMHO must involve writing a proper lexer/parser functions at least for .js .jsx, .ts .tsx files. That seems to be a lot of work, so I believe more likely is that I would pursue this simpler idea - a plugin with handful of universal functions for manipulating source code, that user can use to reshape code any way he/she likes. That would work for all kinds of source files.

Remember that all importers have a post pass in which they are free to alter the already generated nodes.

Javascript is probably unique in all commonly-used languages in not having a proper, always used, syntax for generating classes, functions and methods.  This makes the problem much harder for js than it is for all other languages.

Afaik, there are few if any problems with other languages.  Once in a while the python importer has troubles with if statements at the top level interacting with decorators.  That's about it.  Tweaking the python post pass should fix this, but it's a low priority item.

So yes, have at it if you will.  But remember that many (most?) problems with imports can be fixed by hand in the generated @clean tree.

Edward

vitalije

unread,
Feb 23, 2018, 12:53:12 PM2/23/18
to leo-editor


many (most?) problems with imports can be fixed by hand in the generated @clean tree.
Yes, I am aware of this. But it turned out quite often for me that amount of work needed for that manual fixing exceeds the amount of work needed to split file by hand in the first place.

I would not argue that files from some (possible many) other languages can be imported properly. But there is not so small number of languages like javascript which can't be imported satisfactorily. Javascript itself has a number of dialects, scala is another example. Groovy also comes to my mind. They have so rich syntax that I really doubt that Leo in its current form can handle them very well. I haven't tested it, but every time I used import I ended up doing lot of manual reshaping and quite often I abandoned the whole process and started all over again with manual splitting file.

Vitalije

Edward K. Ream

unread,
Feb 24, 2018, 12:41:37 PM2/24/18
to leo-editor
On Friday, February 23, 2018 at 11:35:42 AM UTC-6, Edward K. Ream wrote:

> Afaik, there are few if any problems with other languages.

Just for the record, there is a generic problem with all importers that has no real solution, namely constructs that look like section references.  Recognizing such things isn't the problem.

Rather, the problem is how to alter the original code.  Any of Leo's importers will generate an @ignore directive for such files because the "missing" section reference will cause the perfect import checks to fail.  It will then be up to the user to change the sources.  Imo, this is a reasonable approach.

Edward
Reply all
Reply to author
Forward
0 new messages