That's a very good idea. I used to maintain implementation notes in the documentation in earlier versions, but they were difficult to maintain (things change at each new version).
I have started writing a general overview (see below, in raw markdown) : is this helpful ? If so I will develop it and add examples of translation. I think Brythonista (my blog) would be a good place for that.
- Pierre
How Brython works
=================
Translation from Python to Javascript
-------------------------------------
A typical Brython-powered HTML page looks like this :
```
<html>
<head>
<script src="/path/to/brython.js"></script>
</head>
<body onload="brython()">
<script type="text/python">
...
</script>
</body>
</html>
```
brython.js exposes 2 names in the global Javascript namespace : `brython`
(the function called on page load) and `__BRYTHON__`, an object that holds all
internal objects needed to run Python scripts.
The function `brython()` inspects all the scripts in the page ; for those that
have the type `text/python`, it reads the Python source code, translates it to
Javascript, and runs this script by a Javascript `eval()`.
If the `<script>` tag has an attribute `src`, an Ajax call is performed to
get the content of the file at the specified url, and its source is converted
and executed as above.
The translation to Javascript takes the following steps :
- a tokenizer reads the tokens in the source code and pass them to an automat
that builds an astract tree for the code, or raises `SyntaxError` or
`IndentationError`.
- this tree is transformed (nodes are added or modified) to translate some
single Python statements in a number of Javascript statements.
- if the debug level is set, additional nodes are added to update an internal
object that is set to the current script name and line number.
- the transformed tree supports a method `to_js()` that returns the Javascript
code.
All this is done in the script __py2js.js__ :
- function `brython()` is the last one in the script
- translation is done by function `py2js()`
- this function calls the tokeniser : function `tokenize()`
- the tokenizer builds a tree made of instances of the class `$Node`
- an instance of `$Node` is created for each new statement, and a context is
created for the node
- new tokens generally change the state of the context by a call such as
`context = transition(context, token_type, token_value)`
- context is an instance of one of the classes defined in the script, whose
name starts with `$` and ends with `Ctx` : for instance, when the tokenizer
encounters the keyword `try`, the function `transition()` retuns an instance
of `$TryCtx`
Builtin objects
---------------
Among the attributes of `__BRYTHON__`, all the built-in Python names (classes,
functions, exceptions, objects) are stored, usually with the same name : for
instance, the built-in class `int` is stored as `__BRYTHON__.int`. Only names
that conflict with Javascript naming rules must be changed, eg `super()` is
implemented as `__BRYTHON__.$$super`.
Implementation of Python objects
--------------------------------
Python strings are implemented as Javascript strings.
Python lists are implemented as Javascript arrays.
Python integers are implemented as Javascript numbers if they are in the range
of Javascript "safe integers", ie in the range [-(2**53-1), 2**53-1] ; outside
of this range they are implemented with an internal class.
Python floats are implemented as instances of the Javascript `Number` class.
All other Python classes (builtin or user-defined) are implemented with 2
Javascript objects :
- a function to create instances
- an object that holds the class attibutes and methods
Name resolution
---------------
A Python program is divided in blocks : modules, functions, classes. For each
block, Brython defines a Javascript variable that will hold all the names
bound in the block (we call it the "block names object").
Based on lexical analysis, including the `global` and `nonlocal` keywords, it
is generally possible to know in which block a name is bound. It is translated
as the attribute of the same name of the block names object.
The only case when the block can't be determined is when the program imports
names by `from some_module import *`. In this case :
- it is impossible to know if a name like `range` referenced in the script is
the built-in class `range` or if it was among the names imported from
`some_module`
- if a name which is not explicitely bound in the script is referenced,
lexical analysis can't determine if it should raise a `NameError`
In this case, the name is translated to a call to a function that will select
at run time the value based on the names actually imported by the module, or
raise a `NameError`.
Execution frames
----------------
Brython handles the execution frames in a stack. Each time the program enters
a new module or a new function (including lambdas and comprehensions),
information about the global and local environment is placed on top of the
stack ; when the function or module exits, the element on top of the stack
is removed.
This is done by inserting calls to the internal functions `enter_frame()` and
`leave_frame()` in the generated Javascript code.
The stack is used for instance by built-in functions `globals()` and
`locals()`, and to build the traceback information in case an exception is
raised.