Major improvement : indexedDB cache for standard library imports

149 views
Skip to first unread message

Pierre Quentel

unread,
May 2, 2018, 3:44:34 AM5/2/18
to brython
In commit fde393b, I have introduced a mechanism that should dramatically improve one of the main problems with Brython so far : the speed of module imports.

Before that, to import a module, at each page load the Python source code was translated into Javascript by the Brython engine. For standard library modules, the source code is found either by an Ajax call, or from the file brython_stdlib.js. Pre-compiling is technically possible and has been suggested many times, but the generated Javascript is something like 10 times bigger than the Python source code, so the whole standard library would be around 30 Mb.

The solution introduced in the commit works under 2 conditions :
- the browser must support the indexedDB database engine (most of them do, including on smartphones)
- the Brython page must use brython_stdlib.js, or the reduced version brython_modules.js generated by the CPython brython module

The main idea is to store the Javascript translation of stdlib modules in an indexedDB database : the translation is done only once for each new version of Brython ; the generated Javascript is stored on the client side, not sent over the network, and indexedDB can easily handle a few Mb of data.

Unfortunately, indexedDB works asynchronously, while import is blocking. With this code:

import datetime
print(datetime.datetime.now())

using indexedDB at runtime to get the datetime module is not possible, because the code that follows the import statement is not in a callback function that could be called when the indexedDB asynchronous request completes.

The solution is to scan the script at translation time. For each import statement in the source code, the name of the module to import is stored in a list. When the translation is finished, the Brython engine enters an execution loop (defined in function loop() in py2js.js) that uses a tasks stack. The possible tasks are:
  • call function inImported() that checks if the module is already in the imported modules. If so, the control returns to loop()
  • if not, add a task to the stack : a call to function idb_get() that makes a request to the indexedDB database to see if the Javascript version of the Python module is already stored ; when the task is added, control returns to loop()
  • in the callback of this request (function idb_load() ) :
    •  if the Javascript version exists in the database, it is stored in a Brython variable (__BRYTHON__.precompiled) and the control returns to loop()
    •  otherwise, the Python source for the module (found in brython_stdlib.js) is translated and another task is added to the stack : a request to store the Javascript code in the indexedDB database. The callback of this request adds another task : a new call to idb_get(), that is sure to succeed this time
  • the last task on the stack is the execution of the original script
At run time, there is a change in py_import.js : when a module in the standard library is imported, the Javascript translation stored in __BRYTHON__.precompiled is executed : the Python to Javascript translation has been made previously.

Cache update

The indexedDB database is associated with the browser and persists between browser requests, when the browser is closed, when the PC is restarted, etc. The process described above must define a way to update the Javascript version stored in the database when the Python source code in the stdlib is changed, or when the translation engine changes.

To achieve this, cache update relies on a timestamp. Each version of Brython is marked with a timestamp, updated by the script make_dist.py. When a script in the stdlib is precompiled and stored in the indexedDB database, the record in the database has a timestamp field set to this Brython timestamp. If a new version of Brython is used in the HTML page, it has a different timestamp and in the result of idb_load(), a new translation is performed.

Limitations


The detection of the modules to import is made by a static code analysis, relying on "import moduleX" of "from moduleY import foo". It cannot work for imports performed with the built-in function __import__(), or for code passed to exec(). In these cases, the previous solution of on-the-fly compilation at each page load is used.

The mechanism is only implemented for modules in the standard library. Using it for modules in site-packages or in the application directory is not implemented at the moment.

Pseudo-code

Below is a simplified version of the cache implementation, written in a Python-like pseudo code.
 

def brython():
   
<get Brython scripts in the page>
   
for script in scripts:
       
# Translate Python script source to Javascript
        root
= __BRYTHON__.py2js(script.src)
        js
= root.to_js()
       
if hasattr(__BRYTHON__, "VFS") and __BRYTHON__.has_indexedDB:
           
# If brython_stdlib.js is included in the page, the __BRYTHON__
           
# object has an attribute VFS (Virtual File System)
           
for module in root.imports:
                tasks
.append([inImported, module])
        tasks
.append(["execute", js])
    loop
()

def inImported(module_name):
   
if module_name in imported:
       
pass
   
elif module_name in stdlib:
        tasks
.insert(0, [idb_get, module_name])
    loop
()

def idb_get(module_name):
    request
= database.get(module_name)
    request
.bind("success",
       
lambda evt: idb_load(evt, module_name))

def idb_load(evt, module_name):
    result
= evt.target.result
   
if result and result.timestamp == __BRYTHON__.timestamp:
        __BRYTHON__
.precompiled[module] = result.content
       
for subimport in result.imports:
            tasks
.insert(0, [inImported, subimport])
   
else:
       
# Not found or outdated : precompile source code found
       
# in __BRYTHON__.VFS
        js
= __BRYTHON__.py2js(__BRYTHON__.VFS[module]).to_js()
        tasks
.insert(0, [store_precompiled, module, js])
    loop
()

def store_precompiled(module, js):
   
"""Store precompiled Javascript in the database."""
    request
= database.put({"content": js, "name": module})

   
def restart(evt):
       
"""When the code is inserted, add a new request to idb_get (this time
        we are sure it will find the precompiled code) and call loop()."""

        tasks
.insert(0, [idb_get, module])
        loop
()

    request
.bind("success", restart)

def loop():
   
"""Pops first item in tasks stack, run task with its arguments."""
   
if not tasks:
       
return
    func
, *args = tasks.pop(0)
   
if func == "execute":
        js_script
= args[0]
       
<execute js_script>
   
else:
        func
(*args)





André

unread,
May 2, 2018, 6:44:18 AM5/2/18
to brython


On Wednesday, 2 May 2018 04:44:34 UTC-3, Pierre Quentel wrote:
In commit fde393b, I have introduced a mechanism that should dramatically improve one of the main problems with Brython so far : the speed of module imports.

... This looks like a major change that would be beneficial for most Brython users. 

...

Unfortunately, indexedDB works asynchronously, while import is blocking.
This could have been problematic for my use case ...
 


Limitations


The detection of the modules to import is made by a static code analysis, relying on "import moduleX" of "from moduleY import foo". It cannot work for imports performed with the built-in function __import__(), or for code passed to exec(). In these cases, the previous solution of on-the-fly compilation at each page load is used.

I use exec() ... and require import to be blocking, as is the case for CPython.  As long as this doesn't change, the use of indexDB should have no impact on my use case.

André

Pierre Quentel

unread,
May 3, 2018, 4:38:01 AM5/3/18
to brython


Le mercredi 2 mai 2018 12:44:18 UTC+2, André a écrit :


I use exec() ... and require import to be blocking, as is the case for CPython.  As long as this doesn't change, the use of indexDB should have no impact on my use case.

If I understand correctly, your use case is to run a script entered online by a user ; if you run it with exec(), the indexedDB cache doesn't work and I don't think it can as long as the indexedDB API is only asynchronous.

In the latest commit I have introduced the function run_script(src[, name]) in module browser. It runs the source code in src as if it was the context of a Brython script, with an optional name. The execution process is the same as for scripts embedded in the page with <script type="text/python">, so it benefits from the indexedDB cache.

You can test it with this simple page :

<!doctype html>
<html>
<meta charset="utf-8">
<head>
<script type="text/javascript" src="/src/brython.js"></script>
<script type="text/javascript" src="/src/brython_stdlib.js"></script>
</head>

<body onLoad="brython(1)">
<script id="ascript" type="text/python">
from browser
import document, run_script

def run
(evt):
    run_script
(document.select_one("textarea").value)

document
["run"].bind("click", run)
</script>
<textarea rows="20" cols="60"></textarea>
<br><button id="run">run</button>
</body>
</html>

Does this help ?

Andre Roberge

unread,
May 3, 2018, 5:26:56 AM5/3/18
to bry...@googlegroups.com
On Thu, May 3, 2018 at 5:38 AM Pierre Quentel <pierre....@gmail.com> wrote:


Le mercredi 2 mai 2018 12:44:18 UTC+2, André a écrit :


I use exec() ... and require import to be blocking, as is the case for CPython.  As long as this doesn't change, the use of indexDB should have no impact on my use case.

If I understand correctly, your use case is to run a script entered online by a user ;

Yes ... and I want the user to be able to import any Python module (the most likely one being random) as part of their script; basically, it is a Python playground.
 
if you run it with exec(), the indexedDB cache doesn't work and I don't think it can as long as the indexedDB API is only asynchronous.
That's ok.   I guess I did not explain correctly: I want to make sure that I have a way to bypass the new import mechanism as I do not want asynchronous imports.  exec() does it ... and this is perfectly fine with me!!
 

In the latest commit I have introduced the function run_script(src[, name]) in module browser. It runs the source code in src as if it was the context of a Brython script, with an optional name. The execution process is the same as for scripts embedded in the page with <script type="text/python">, so it benefits from the indexedDB cache.


 

You can test it with this simple page :

<!doctype html>
<html>
<meta charset="utf-8">
<head>
<script type="text/javascript" src="/src/brython.js"></script>
<script type="text/javascript" src="/src/brython_stdlib.js"></script>
</head>

<body onLoad="brython(1)">
<script id="ascript" type="text/python">
from browser
import document, run_script

def run
(evt):
    run_script
(document.select_one("textarea").value)

document
["run"].bind("click", run)
</script>
<textarea rows="20" cols="60"></textarea>
<br><button id="run">run</button>
</body>
</html>

Does this help ?


I'll have to try it out to see.  I still have to update my site to the latest version (with the improved turtle module) - but have little time to do so at present.

André 

--
You received this message because you are subscribed to a topic in the Google Groups "brython" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/brython/_QdPh5ThxPU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to brython+u...@googlegroups.com.
To post to this group, send email to bry...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brython/fb43c305-f5d3-4731-8c2c-37ce3efc4636%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Verner

unread,
May 3, 2018, 11:24:38 AM5/3/18
to brython
Wow, this is very, very nice !!!

Carlo Oliveira

unread,
May 4, 2018, 11:12:12 AM5/4/18
to brython
I certainly will suffer the same problem as André with my educational platform. I relies on exec(self.code, glob) # dict(__name__="__main__")) to run the students code. The __name__ = "__main__" must be passed into globals so that the student learn to use the if __name__ shebang. Hope that run_script or other solution kind async_import = False will prevent me to freeze on brython 3.4 forever.


Reply all
Reply to author
Forward
0 new messages