Weird handle_info arguments when monitoring process exists with custom reason

189 views
Skip to first unread message

Aleksey Gureiev

unread,
Oct 15, 2014, 4:45:14 PM10/15/14
to elixir-l...@googlegroups.com
Hey gents,

I have a master process starting sub-processes of two kinds and then monitoring their completion. The master process is GenServer and I figured that in order to differentiate between subprocess1 and subprocess2 I need to call "exit(:subprocess1)" and "exit(:subprocess2)" in them, and then catch it with handle_info callback on the master process side, like this:

    defmodule Master do
      use GenServer

      ...

      def handle_info({ :DOWN, _ref, :process, _pid, :subprocess1 }, state), do: { ... }

      def handle_info({ :DOWN, _ref, :process, _pid, :subprocess2 }, state), do: { ... }

    end


The problem I'm facing is that when I do "exit(:normal)" in the subprocess, the callback arrives as:

    handle_info({ :DOWN, <ref>, :process, <pid>, :normal }, <state>)

.. but if I send "exit(:subprocess1)", it calls:

    handle_info({ :DOWN, <ref>, :process, <pid>, :subprocess1 }, :normal)

NOTICE: <state> is now not sent, but there's ":normal" instead.



Can anyone explain what goes on?
Can anyone suggest a better approach? Just to reiterate, I need to follow the completion of two kinds of processes (and update counters in the state.)

Thanks,

- Aleks

Robert Virding

unread,
Oct 15, 2014, 5:02:08 PM10/15/14
to elixir-l...@googlegroups.com
Note that <state> is not really "sent", it is the second argument to the handle_info/2 callback which is called by the behaviour. The behaviour manages your state threading it through the calls. A bit didactic, sorry.

Getting <state> of :normal as the 2nd argument has nothing to do with whether the process was terminated with :normal or :subprocess1. The most likely reason is that a return value from another callback is wrong so the state has now become the atom :normal. The behaviours are very sensitive that you the return values right. They do exactly what you tell them to. :-)

Why not just keep track of the pids and use that to determine which process has died?

Robert

Josh Adams

unread,
Oct 15, 2014, 5:07:04 PM10/15/14
to elixir-l...@googlegroups.com
The behaviours are very sensitive that you the return values right.

For some reason, given the context, the omission here made me cackle endlessly.

-Josh 

Aleksey Gureiev

unread,
Oct 16, 2014, 12:41:12 AM10/16/14
to elixir-l...@googlegroups.com
Not sure why when you make a call to a callback you can't "send" an argument, but thanks. The idea of that someone's changing the state to :normal sounds plausible.

Tracking pids doesn't sound all that good. Instead of telling who's down, I'll have to look through the million-record set to learn that. Inefficient at all, if you ask me. Unless I'm missing something.

- Aleks

Saša Jurić

unread,
Oct 16, 2014, 2:47:20 AM10/16/14
to elixir-l...@googlegroups.com
I might be misunderstanding, but it looks to me as if you're trying to abuse a process exit reason to somehow identify the process "kind" (type of a job). That seems like an antipattern to me.

If a master process is spawning a set of workers and wants to receive completion notifications, I would prefer an explicit message. Why not make each worker send a message back to the master, relying on monitors to detect situations where some process terminates unexpectedly?

Something like:
1. Have master spawn_monitor workers.
2. Send completion message from each worker to the master.
3. Handle completion messages in the master.
4. Handle :DOWN messages when the reason is not :normal to detect a crash of a worker, and do something about it.

Alternatively, you could spawn_link workers which means that a crash of a single worker will crash the master and all other workers.


On Wednesday, October 15, 2014 10:45:14 PM UTC+2, Aleksey Gureiev wrote:

Aleksey Gureiev

unread,
Oct 16, 2014, 10:11:21 AM10/16/14
to elixir-l...@googlegroups.com
The reason for doing it this way is to avoid cyclic knowledge (master knows about subprocesses, and subprocesses know about "master"). Subprocesses finish their work and terminate, and I follow their lifecycle watching from outside and taking notes. You are right about the abnormal termination cases -- that is also the case and I'm also planning to monitor them, but... abnormal terminations also need some kind of identification so that I could learn which kind of processes failed to increment failure count of a correct process category.

Two days ago, I had an intermediary for each subprocess kind -- a process that was responsible for spawning and tracking specific type of subprocesses. Upon completion it would report to Master. It felt like an overkill and I gave up on them in favor of more simple and direct model. Apparently, it's not that simple at all.

At this point, I figured the original but with receiving ":normal" as the state (it was exactly as Robert hinted -- one of the {:noreply...} results was formatted as {:noreply, :normal, new_state}), so there's no rush, but I'm still thinking about the best design for this and on the fence between:

1. direct finish callbacks as Saša suggests
2. getting back subprocess group runners

If anyone wishes to go on with discussion, you are welcome. Otheriwse, thanks to everyone! I'll be coming back with more questions as I go. After two decades programming stuff when you think you saw it all, here comes another exciting tech to pick up and enjoy!

Sasa Juric

unread,
Oct 16, 2014, 10:30:29 AM10/16/14
to elixir-l...@googlegroups.com
I think it’s perfectly fine to pass a reporting pid to the worker processes, and have each worker process send the response message back to that pid. I’ve used this approach countless times, and my colleagues are using this technique as well. After all, this is how a synchronous call-and-response works in Elixir/Erlang.
Notice that this is not “cyclic knowledge”, since a worker receives the reported pid via an argument so it doesn’t have to posses a hardcoded knowledge about the master.

My advice is to make each worker send the result and a category for example in form {:category_foo, result}, {:category_bar, result}, …

Relying on abnormal exit in the name of some “weak coupling” might get you in all sorts of problems, since some parts of OTP rely on the exit reason, and you’re abusing this to send some additional response information back. 

Exit reason should be used, as the name implies, to describe the reason why a process has terminated. Messages should be used to transfer information between processes.


--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-talk/K4WkI8olq1A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-ta...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aleksey Gureiev

unread,
Oct 16, 2014, 11:12:04 AM10/16/14
to elixir-l...@googlegroups.com
I think it’s perfectly fine to pass a reporting pid to the worker processes, and have each worker process send the response message back to that pid. I’ve used this approach countless times, and my colleagues are using this technique as well. After all, this is how a synchronous call-and-response works in Elixir/Erlang.
Notice that this is not “cyclic knowledge”, since a worker receives the reported pid via an argument so it doesn’t have to posses a hardcoded knowledge about the master.

That's true until you want to do something like:

Master.subprocess_finished(master_pid)

But generally, I start to lean towards this approach.
 
My advice is to make each worker send the result and a category for example in form {:category_foo, result}, {:category_bar, result}, …

Relying on abnormal exit in the name of some “weak coupling” might get you in all sorts of problems, since some parts of OTP rely on the exit reason, and you’re abusing this to send some additional response information back. 

Exit reason should be used, as the name implies, to describe the reason why a process has terminated. Messages should be used to transfer information between processes.

Makes sense. Glad I asked.
 
Reply all
Reply to author
Forward
0 new messages