Debugging an infinite loop

508 views
Skip to first unread message

Dave Thomas

unread,
May 30, 2013, 8:50:59 PM5/30/13
to elixir-l...@googlegroups.com
Sometimes I get pattern problems that results in my functions recursing indefinitely. TCO mens I don't run out of stack.

Is there a trick to working out just what is looping?



José Valim

unread,
May 30, 2013, 9:29:23 PM5/30/13
to elixir-l...@googlegroups.com
I know you can use Process.info(pid) to get the current function a process is at:

iex(node_one@macbook)1> Process.info self
[current_function: {Process,:info,1}, initial_call: {:erlang,:apply,2}, status: :running, message_queue_len: 0, messages: [], links: [#PID<0.31.0>], dictionary: [iex_history_start_counter: 1, iex_history_counter: 1], trap_exit: true, error_handler: :error_handler, priority: :normal, group_leader: #PID<0.31.0>, total_heap_size: 6771, heap_size: 2586, stack_size: 19, reductions: 5094, garbage_collection: [min_bin_vheap_size: 46422, min_heap_size: 233, fullsweep_after: 65535, minor_gcs: 1], suspending: []]

The next challenge seems to be finding out in which process the loop is (and how to spawn a new process after the current one goes into a loop).


José Valim
Skype: jv.ptec
Founder and Lead Developer


On Fri, May 31, 2013 at 9:50 AM, Dave Thomas <da...@pragprog.com> wrote:
Sometimes I get pattern problems that results in my functions recursing indefinitely. TCO mens I don't run out of stack.

Is there a trick to working out just what is looping?



--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Dave Thomas

unread,
May 30, 2013, 11:52:14 PM5/30/13
to elixir-l...@googlegroups.com

One of the nice things about Ruby is that if a process goes haywire, it isn't too hard to get a stacktrace. (Of course, that's largely because it doesn't have TCO).

One of the things that I've always found problematic about Erlang is that the reporting assumes that every app is a massive app, and so doesn't make the simple case simple.

Would it be possible to wrap Elixir execution somehow, so that if it gets a ^C it goes into something a little more user friendly and meaningful than

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution

And, if it is possible, could someone give me some pointers? I'm a firm believer in "he who complains, fixes".

Dave

José Valim

unread,
May 30, 2013, 11:57:12 PM5/30/13
to elixir-l...@googlegroups.com
It is possible by providing our own implementation of user_drv (or patching the existing one or sending a patch to Erlang to provide the proper hooks). There is a very good article about how the erlang shell works as well.


José Valim
Skype: jv.ptec
Founder and Lead Developer


Dave Thomas

unread,
May 31, 2013, 12:17:14 AM5/31/13
to elixir-l...@googlegroups.com
Cool—weekend reading.

In the meantime, I found my problem. Try:

~~~
Enum.reverse {1}
~~~

and

~~~
Enum.reverse {nil}
~~~

:)




José Valim

unread,
May 31, 2013, 1:33:36 AM5/31/13
to elixir-l...@googlegroups.com
Oops, this is a bug. :) Please report in the issues tracker?



José Valim
Skype: jv.ptec
Founder and Lead Developer


Jonathan Hunt

unread,
May 31, 2013, 1:51:50 AM5/31/13
to elixir-l...@googlegroups.com
I just created an issue https://github.com/elixir-lang/elixir/issues/1159

On Thu, May 30, 2013 at 10:33 PM, José Valim
Jonathan J Hunt <j...@me.net.nz> http://www.me.net.nz
"Science is the belief in the ignorance of experts" Richard Feynman

Alexei Sholik

unread,
May 31, 2013, 4:43:54 AM5/31/13
to elixir-l...@googlegroups.com
> One of the nice things about Ruby is that if a process goes haywire, it isn't too hard to get a stacktrace. (Of course, that's largely because it doesn't have TCO).

I'm not buying it. You don't use recursion for loops in Ruby. Is it as easy to get a stacktrace of a Ruby program stuck in an infinite loop without bringing the program down? It's doable in Elixir.

The ^C menu actually has some use.

iex(1)> Enum.reverse {nil}


BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution

If you press `p`, you will see a dump of all running processes with some info about them. If you can search in your terminal screen, look for the text "Scheduled". For me, only one process has this state and it shows a partial stacktrace containing "Elixir.Enumerable':reduce/3".

So now we now the PID of the hanging process. Let's examine it closer.

Type ^G and type in

    s 'Elixir.IEx'
    c 2

This will start another shell process and switch the shell to it. We can then examine our hanging process:

    iex(1)> :erlang.process_info list_to_pid('<0.26.0>')
[current_function: {Module,:concat,2}, initial_call: {:erlang,:apply,2}, status: :running, message_queue_len: 0, messages: [], links: [#PID<0.25.0>], dictionary: [iex_history_start_counter: 1, iex_history_counter: 1], trap_exit: true, error_handler: :error_handler, priority: :normal, group_leader: #PID<0.25.0>, total_heap_size: 2962, heap_size: 376, stack_size: 22, reductions: 6399896518, garbage_collection: [min_bin_vheap_size: 46422, min_heap_size: 233, fullsweep_after: 65535, minor_gcs: 1801], suspending: []]

This effectively traces the latest call of the specified process, calling this multiple times will output different value for the :current_function key.

You can go nuts with the :erlang.process_info/2 function. Here, I'll use the :backtrace option:

    iex(3)> {:backtrace, trace} = :erlang.process_info list_to_pid('<0.26.0>'), :backtrace
    ...
    iex(4)> IO.puts trace
    Program counter: 0x000000001f77ecb0 (elixir_aliases:raw_concat/1 + 8)
    CP: 0x000000001f77ec00 (elixir_aliases:concat/1 + 40)
   
    0x000000001de168f8 Return addr 0x000000001f9037a8 ('Elixir.Enumerable':reduce/3 + 312)
   
    0x000000001de16900 Return addr 0x000000001f6e1558 (erl_eval:do_apply/6 + 528)
    y(0)     #Fun<Elixir.Enum.21.17311244>
    y(1)     []
    y(2)     {nil}

    ... and so on ...

This gives you an idea of what is going on.

Now to tracing:

# configure the tracer to stop after 100 calls, taken from here http://stackoverflow.com/questions/1954894/using-trace-and-dbg-in-erlang
iex(5)> :dbg.tracer(:process, {fn (_, 100) -> :dbg.stop_clear(); (msg, n) -> :io.format('~p~n', [msg]); n+1 end, 0 })
{:ok,#PID<0.41.0>}

# trace all function calls
iex(6)> :dbg.tp(:_, [])
{:ok,[{:matched,:"nonode@nohost",3097}]}

# trace our hanging process
iex(7)> :dbg.p(list_to_pid('<0.26.0>'), :c)

{trace,<0.26.0>,call,{erlang,'++',[".Enumerable",[]]}}
{trace,<0.26.0>,call,{erlang,'++',["Elixir",".Enumerable"]}}
{trace,<0.26.0>,call,{erlang,list_to_atom,["Elixir.Enumerable"]}}
{trace,<0.26.0>,call,
       {'Elixir.Enumerable',reduce,[{nil},[],#Fun<Elixir.Enum.21.17311244>]}}
{trace,<0.26.0>,call,{'Elixir.Module',concat,['Elixir.Enumerable',nil]}}
{trace,<0.26.0>,call,{elixir_aliases,concat,[['Elixir.Enumerable',nil]]}}
{trace,<0.26.0>,call,{erlang,atom_to_list,['Elixir.Enumerable']}}
{trace,<0.26.0>,call,{lists,concat,[[".Enumerable"]]}}
{trace,<0.26.0>,call,{erlang,'++',[".Enumerable",[]]}}
{trace,<0.26.0>,call,{erlang,'++',["Elixir",".Enumerable"]}}
{trace,<0.26.0>,call,{erlang,list_to_atom,["Elixir.Enumerable"]}}
{trace,<0.26.0>,call,
       {'Elixir.Enumerable',reduce,[{nil},[],#Fun<Elixir.Enum.21.17311244>]}}
...

Looking at the partial output, you can see the process is in a loop. Finally, we can kill the offending process

:erlang.exit list_to_pid('<0.26.0>'), :kill

So what has just happened is our shell hung, we were able to start a new one, examine the running processes and trace a particular process's execution. Then we were able to kill the offending shell process while the rest of the system continued working as if nothing happened.

And the usual disclaimer: this is probably less than 1% of the functionality exposed by :dbg and other Erlang tools and it's just what I've dug out in the last hour by searching the Internet. I'm sure every veteran Erlangers is a Matrix architect in disguise :8
Best regards
Alexei Sholik

Jeremy Huffman

unread,
May 31, 2013, 7:55:20 AM5/31/13
to elixir-l...@googlegroups.com

Thank you for writing all this up Alexei!

Dave Thomas

unread,
May 31, 2013, 7:33:12 PM5/31/13
to elixir-l...@googlegroups.com

Alexei:

Thanks for summarizing this. It's pretty close to what I had to go through (although I found my process by PID) to find that it was looping in reduce.

And, don't get me wrong. This isn't an Elixir issue. The problem is that Erlang is written for really large scales, and therefore has to give you everything when you ask for information. For complex system, this is just what you need. But for the simple case, the steps that both you and I went through were pretty indirect.

In Ruby, I'd just add trap("INT") { raise } at the start of my code.

Here's an example

def fred
  bert
end

def bert
  wilma
end

def wilma
  loop { sleep 0.01 }
end

trap("INT") {  raise }

fred

When run:

dave[~/tmp] ruby loop.rb
^Cloop.rb:10:in `sleep': Interrupt
    from loop.rb:10:in `block in wilma'
    from loop.rb:10:in `loop'
    from loop.rb:10:in `wilma'
    from loop.rb:6:in `bert'
    from loop.rb:2:in `fred'
    from loop.rb:13:in `<main>'

To do it on a running process, I get into it with gdb, then just rb_eval("raise")

I may take a stab at seeing what I can do in this area once I dig out from the backlog caused by the book ;)

Dave

Alexei Sholik

unread,
May 31, 2013, 7:58:29 PM5/31/13
to elixir-l...@googlegroups.com
I see your point Dave. I agree that it's simple with Ruby, because in Ruby your program either works or crashes :)

In Erlang your program is just one process, among many others. This adds some overhead because you need to identify which process is misbehaving first. But it doesn't have to be hard all the time. For instance, you could come up with a module with some convenience functions, like this one:

  def find(module) do
    mod = atom_to_binary module
    Enum.filter :erlang.processes, fn pid ->
      { :backtrace, bt } = :erlang.process_info(pid, :backtrace)
      :binary.match(bt, mod) != :nomatch
    end
  end

You give it a ModuleAtom and it'll search all running processes for that module in the backtrace. Once you get the pid, you can kill it like you did in Ruby with :erlang.exit(pid, :reason). You can't gdb it, but we have a student working on debugger for Elixir during the summer. So you can nudge him a bit to implement friendly debugging experience in Elixir :)

Joe Armstrong

unread,
Jun 1, 2013, 12:09:56 PM6/1/13
to elixir-l...@googlegroups.com, da...@pragprog.com
Just an idea 

> iex :observer.start()

Should start the OTP observer application. This is a gui based on wxWidgets - click on the processes tab
and look for the one with loads of reductions. You can inspect the system in various ways.

Note: wxWigets support must be enabled in R16B

See


(If you have a mac)

It's probably a good idea to interface the Erlang GUI stuff is described in


An Elixir binding to this would be great

/Joe


/Joe

Jeremy Huffman

unread,
Jun 1, 2013, 9:08:01 PM6/1/13
to elixir-l...@googlegroups.com
On Sat, Jun 1, 2013 at 12:09 PM, Joe Armstrong <joe...@gmail.com> wrote:
Just an idea 

> iex :observer.start()

This thing is amazing, I don't know how widely known it is but it was sure a surprise to me. Though it shouldn't be, this is the sort of thing that gets built when people run critical distributed systems for years and years.
Reply all
Reply to author
Forward
0 new messages