About separate processes and threads

55 views
Skip to first unread message

Edward K. Ream

unread,
Oct 16, 2018, 7:36:31 AM10/16/18
to leo-editor
The original Dreaming Big Dreams thread suggested reorganizing Leo into more separable pieces, perhaps using a client/server architecture or other forms of interprocess communication (IPC).

Here I'd like to present what little I know about IPC, servers and threads.  The purpose is continue the conversation, and to expose my own misconceptions.  Please comment.

Separate processes share no data

Some kind of IPC is required. There seems to be no end of such mechanisms:

- pyzo uses yoton.
- neovim uses msgpack plus vim-related wrappers.
- LeoVue uses node.js client-server architecture.
- Jupyter uses another client-server architecture.
- All other client/server architectures have/are their own forms of IPC.

I have no idea what would be best for a more "distributed" version of Leo.  Does anyone have any organizing principles or ideas they would like to share?

Separate threads share (almost?) all data

Debuggers need access to the program under test, so must run in a separate thread rather than a separate process.  Otoh, both the debugger and the program under test could run in a separate process from the IDE.  In that case, the processes would have to communicate via IPC.

The shared data between threads causes well-known problems.  Race conditions can result.  Python's queue.Queue is one (low performance) way of avoiding some, but not all, of the problems.

Leo's new debugger contains a listener, in the main thread, that receives requests from the debugger thread.  The debugger thread is driven by commands from the main thread.  None of the code could be called elegant.  It has a chance of working because only the g.app.xdb ivar is set by both threads. It must be set/cleared very carefully. As I write this, I suspect that the present code, while not necessarily wrong, would likely benefit from a lock on this ivar.

Summary

There seems to be too many choices for IPC.  Your comments, please.  Please correct me if I have said something dubious.

Edward

Terry Brown

unread,
Oct 16, 2018, 9:55:35 AM10/16/18
to leo-e...@googlegroups.com
On Tue, 16 Oct 2018 04:36:31 -0700 (PDT)
"Edward K. Ream" <edre...@gmail.com> wrote:

> The original Dreaming Big Dreams thread suggested reorganizing Leo
> into more separable pieces, perhaps using a client/server
> architecture or other forms of interprocess communication (IPC).

Here's a presentation I just gave:
https://tbnorth.github.io/multiproc/

So processes *can* share memory, although the sharing is managed with
IPC calls. My presentation's for distributed processing of data in
NumPy, CPU intensive, not necessarily relevant to whatever it is you're
considering for Leo.

Cheers -Terry

> Here I'd like to present what little I know about IPC, servers and
> threads. The purpose is continue the conversation, and to expose my
> own misconceptions. Please comment.
>
> *Separate processes share no data*
>
> Some kind of IPC is required. There seems to be no end of such
> mechanisms:
>
> - pyzo uses yoton <https://yoton.readthedocs.io/en/latest/>.
> - neovim uses msgpack <https://msgpack.org/index.html>plus
> vim-related wrappers.
> - LeoVue uses node.js client-server architecture.
> - Jupyter <http://jupyter.org/>uses another client-server
> architecture.
> - All other client/server architectures have/are their own forms of
> IPC.
>
> I have no idea what would be best for a more "distributed" version of
> Leo. Does anyone have any organizing principles or ideas they would
> like to share?
>
> *Separate threads share (almost?) all data*
>
> Debuggers need access to the program under test, so must run in a
> separate *thread* rather than a separate process. Otoh, both the
> debugger and the program under test could run in a separate process
> from the IDE. In that case, the processes would have to communicate
> via IPC.
>
> The shared data between threads causes well-known problems. Race
> conditions can result. Python's queue.Queue is one (low performance)
> way of avoiding some, but not all, of the problems.
>
> Leo's new debugger contains a listener, in the main thread, that
> receives requests from the debugger thread. The debugger thread is
> driven by commands from the main thread. None of the code could be
> called elegant. It has a chance of working because *only *the
> g.app.xdb ivar is set by both threads. It must be set/cleared very
> carefully. As I write this, I suspect that the present code, while
> not *necessarily* wrong, would likely benefit from a lock
> <https://docs.python.org/2/library/threading.html#lock-objects>on
> this ivar.
>
> *Summary*

Edward K. Ream

unread,
Oct 16, 2018, 11:02:14 AM10/16/18
to leo-editor
On Tue, Oct 16, 2018 at 8:55 AM Terry Brown <terry...@gmail.com> wrote:
On Tue, 16 Oct 2018 04:36:31 -0700 (PDT)
"Edward K. Ream" <edre...@gmail.com> wrote:

> The original Dreaming Big Dreams thread suggested reorganizing Leo
> into more separable pieces, perhaps using a client/server
> architecture or other forms of interprocess communication (IPC).

Here's a presentation I just gave:
https://tbnorth.github.io/multiproc/

I am enjoying this.  The real reason I wrote the original post was to get some help in understanding processes, thread and servers.

I'll interrupt my study to ask this rhetorical question. Why do I think this hilarious cartoon applies to Leo's documentation?

Edward

tfer

unread,
Oct 16, 2018, 12:18:57 PM10/16/18
to leo-editor
Their was a book on Ada I bought a few decades ago that dealt with locks and other interprocess stuff.  The author explained things by diagrams that had button, (push and pull), hatches and other things that illustrated how the various concepts worked in a graphical mechanistic fashion, (apparently Ada has a lot of this stuff).

Not really what I was suppose to be learning at the time, but it was interesting so I thought I'd get back to it someday, and it matched the way I think about those things.  Lost the book in a move, anybody here know the title/author?

Tom

vitalije

unread,
Oct 16, 2018, 5:07:11 PM10/16/18
to leo-editor
I am enjoying this.  The real reason I wrote the original post was to get some help in understanding processes, thread and servers.

I'll interrupt my study to ask this rhetorical question. Why do I think this hilarious cartoon applies to Leo's documentation?

Ha, ha, ha, that cartoon is really great. 

I don't have much time to write right now, but here is a list of (IMHO) essential facts one should keep in mind when dealing with concurrency:
  • in terms of required computer resources most expensive are processes, then threads, and then micro-threads (where by micro-threads I understand both doing several tasks simultaneously on different CPU cores and some clever concepts implemented in frameworks like Actors in Akka, or something similar in Twisted). It is not so cheap to spawn 10 processes, or 100 threads. But you can easily run 10000 Actors on modest hardware. 
  • One should not communicate by sharing state between concurrent threads/processes. Instead it is much better to share the state by communicating (like you used queue in Leo debugger)
  • Pure functions (functions that do not produce any side effect) are best friends when dealing with concurrency. These functions return the same result for the same input arguments no matter when and in which order they are called. That helps by letting programmer free to call them in any order and from any thread, or to call them even more than once if necessary. For example in Clojure and ClojureScript there is a kind of variable called atom. Atoms contain some data that can be safely read at any time from any thread. If one want to change the data contained in atom, it isn't possible to write new data directly to atom. Instead one should provide a pure function and perhaps some additional arguments to it. This function will be called with the current data from atom and the rest arguments and its result will be written in atom. If two or more threads try to change atom at the same time, then right before writing result to atom,  atom will check if it currently holds the same data that was passed as an argument to pure function. In case it was changed sometimes between the call and the returning result (by some other thread), then its result is dropped and function is called again using fresh data. This is done automatically by system, so programmer doesn't have to think about it all the time. The only thing programmer must obey is that function must be pure and it may be called more than once.
  • server usually try to serve more than one client simultaneously, it can't be sure at what time client will send request. Clients often don't know about each other. So servers are usually implemented using some kind of concurrency. Sometimes they use several processes, sometime they use several threads, and sometimes they use both several processes and several threads. Most common thing is to dedicate one thread for each incoming request. However that limits how many simultaneous requests can server accept. If you wish to allow more requests then it is usually achieved by using small number of threads from the pool of threads and using some kind of micro-threads like Promises, Actors, ... 
  • Server and client may be on different machines so they usually can't share state. To have same data on the server and in the client they need to encode data in some format acceptable for both parties and to agree on some kind of protocol how and in which order will they exchange encoded messages. Upon receiving encoded messages both client and server decode them and use data to adjust their own internal state to match the state of the other side (server/client, client/server). Even if both server and client are on the same machine, they may be in different processes.
  • Its always easier if you have same language on both sides server and client, but this is not required and very often is not the case. OTOH it allows you to completely replace one of them with totally different implementation and only to keep same protocol and same data encoding and other side won't notice the difference. For example if we have Leo server implemented in Python, and client implemented in JavaScript, then it would be possible (not necessarily easy) to re-implement Leo server in some other language (say Clojure for example), and its JavaScript won't know. Or it would be possible to implement client using python and some other gui library, and server will work without change. 
  • Go language has very nice way to make spawning new tasks to several CPU cores on micro level and re-synchronizing them very cheaply and very easily. Rust prevents sharing mutable state, code won't compile if you try to share mutable variable. Clojure and ClojureScript allow you to work with concurrency in the same way Go language does. The macro for starting async operation is named 'go'.
  • Actors in Akka (and I believe something similar is in Twisted) have own mailbox each for accepting messages. You can't have a reference to an Actor object. Only thing you can have is its "postal" address. So you can send a message to an actor, but you can't be sure if the actor will ever receive it  or not. Actor may be dead by the time message arrives or it can move to some other address. The only way to be sure is if the actor replies by some message confirming that your previous message is accepted. But the return message may be lost as well, so you can't be really sure. But you can rely on the fact that if you send to one actor three messages (A, B, C) and you send them in that order, they will be eventually received in exactly the same order. Actor may receive some other messages from other actors in between these three messages but one can be sure that from one actor to another messages are comming in the same order they were sent (if they ever come).
  • Promises, Actors, micro-threads, compiler ensuring mutable state is never shared, are all invented to allow programmer to rely on something in order to simplify writing, reading, understanding and reasoning about concurrent programs. If you cant rely on anything, then you cant possibly understand what program will do.
I've got to go now. 
HTH
Vitalije

Edward K. Ream

unread,
Oct 17, 2018, 8:23:08 AM10/17/18
to leo-editor
On Tue, Oct 16, 2018 at 4:07 PM vitalije <vita...@gmail.com> wrote:

I don't have much time to write right now, but here is a list of (IMHO) essential facts one should keep in mind when dealing with concurrency:

A great summary.  Many thanks for this.  I've bookmarked this reply and will study the features you mention.

Imo, a client-server architecture is required when dealing with multiple languages.  For example, Jupyter supports "kernels" in various languages.  Iirc all the infrastructure is in python and javascript, but kernels can/must be written in other languages.

Edward
Reply all
Reply to author
Forward
0 new messages