ENB: comparing the Rust and Python tokenizers.

43 views
Skip to first unread message

Edward K. Ream

unread,
Oct 11, 2024, 7:01:52 AM10/11/24
to leo-editor

The file tbo-in-rust.leo (in my ekr-tbo-in-rust repo) now contains the sources for the Rust and Python tokenizers, as shown by Leo's import commands. This work concludes (for now) my work with Rust.


As you may recall, Python's tokenizer is much faster than Rust's. The sources show why. Python's tokenizer is highly optimized C code, complete with critical sections!


Code overview


- Most of Lib/tokenize.py pertains only to the (non-critical) untokenize method.

- Tokenize.py delegates the tokenize method to the _generate_tokens_from_c_tokenizer in the _tokenize module.

- It took several hours to find the _tokenize module! It's in the file cpython/Python/Python-tokenize.c (!!)


The magic happens in the function:

    PyMODINIT_FUNC PyInit__tokenize(void);

which is part of Python's C-language interface.


Summary


tbo-in-rust.leo now contains the sources for the Python and Rust tokenizers. The Python tokenizer is highly optimized C code.


Today's work completes my study of Rust for now.


Edward

Zoom.Quiet

unread,
Oct 11, 2024, 8:43:25 AM10/11/24
to leo-e...@googlegroups.com
Edward K. Ream <edre...@gmail.com> 于2024年10月11日周五 19:01写道:
> --
> You received this message because you are subscribed to the Google Groups "leo-editor" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/f590dcd4-6540-4fc6-b10e-45e2f40b3764n%40googlegroups.com.


That's why it is said:
Python is just a friendly DSL of C language

And Rust's goal is just a safe C++,
but C++'s performance has never been as good as C;

So, Leo's use of Rust may not bring about a qualitative improvement,
but learning and using Rust is a good experience.

Edward K. Ream

unread,
Oct 11, 2024, 9:37:33 AM10/11/24
to leo-e...@googlegroups.com
On Fri, Oct 11, 2024 at 7:43 AM Zoom.Quiet <zoom....@gmail.com> wrote:

Thanks for your comments :-)

That's why it is said:
Python is just a friendly DSL of C language

Python is way more than "just" a DSL.

Python is a work of genius because of its simplicity. 5-year-olds can (and have) successfully programmed in Python.

And Rust's goal is just a safe C++,
but C++'s performance has never been as good as C;

Imo, the performance difference arises because Python's tokenizer is highly optimized while Rust's is not.

So, Leo's use of Rust may not bring about a qualitative improvement,
but learning and using Rust is a good experience.

Yes, that's why I studied Rust.

Edward

Zoom.Quiet

unread,
Oct 11, 2024, 9:45:38 AM10/11/24
to leo-e...@googlegroups.com
Edward K. Ream <edre...@gmail.com> 于2024年10月11日周五 21:37写道:
...
>
>> So, Leo's use of Rust may not bring about a qualitative improvement,
>> but learning and using Rust is a good experience.
>
>
> Yes, that's why I studied Rust.
>

after learned Rust,
should learnning Zig, will know which one is the future C ;-)

PS:
goplus/llgo: A Go compiler based on LLVM in order to better integrate
Go with the C ecosystem including Python
https://github.com/goplus/llgo
can easy usage py in go,
and, usage go in python;
that is another magic world.

> Edward

Thomas Passin

unread,
Oct 11, 2024, 10:00:08 AM10/11/24
to leo-editor
On Friday, October 11, 2024 at 8:43:25 AM UTC-4 Zoom.Quiet wrote:
That's why it is said:
Python is just a friendly DSL of C language

This is far off the mark I can't believe anyone wrote it. If you wrote that Forth is just a DSL of assembler you would be closer to the mark.  Python could be, and has been, implemented in other languages. Among them:

 

Edward K. Ream

unread,
Oct 11, 2024, 1:50:59 PM10/11/24
to leo-e...@googlegroups.com
On Fri, Oct 11, 2024 at 8:45 AM Zoom.Quiet <zoom....@gmail.com> wrote:


should learnning Zig, will know which one is the  future C ;-)
...
goplus/llgo: A Go compiler based on LLVM in order to better integrate
Go with the C ecosystem including Python
https://github.com/goplus/llgo
can easy usage py in go,
and, usage go in python;
that is another magic world.

Sounds interesting.  Thanks for the links!

Edward
Reply all
Reply to author
Forward
0 new messages