The file tbo-in-rust.leo (in my ekr-tbo-in-rust repo) now contains the sources for the Rust and Python tokenizers, as shown by Leo's import commands. This work concludes (for now) my work with Rust.
As you may recall, Python's tokenizer is much faster than Rust's. The sources show why. Python's tokenizer is highly optimized C code, complete with critical sections!
Code overview
- Most of Lib/tokenize.py pertains only to the (non-critical) untokenize method.
- Tokenize.py delegates the tokenize method to the _generate_tokens_from_c_tokenizer in the _tokenize module.
- It took several hours to find the _tokenize module! It's in the file cpython/Python/Python-tokenize.c (!!)
The magic happens in the function:
PyMODINIT_FUNC PyInit__tokenize(void);
which is part of Python's C-language interface.
Summary
tbo-in-rust.leo now contains the sources for the Python and Rust tokenizers. The Python tokenizer is highly optimized C code.
Today's work completes my study of Rust for now.
Edward
That's why it is said:
Python is just a friendly DSL of C language
And Rust's goal is just a safe C++,
but C++'s performance has never been as good as C;
So, Leo's use of Rust may not bring about a qualitative improvement,
but learning and using Rust is a good experience.
That's why it is said:
Python is just a friendly DSL of C language
should learnning Zig, will know which one is the future C ;-)
goplus/llgo: A Go compiler based on LLVM in order to better integrate
Go with the C ecosystem including Python
https://github.com/goplus/llgo
can easy usage py in go,
and, usage go in python;
that is another magic world.