I'm writing a scanner for a query language and I'm encountering
intermittent segmentation faults and other odd errors. The
code I'm working on appears to work fine on 11.b.2-4
(linux/amd64), but gives problems on r12b-0 (linux/i386) and
r12b-1 (linux/amd64). I didn't add any fancy options when I
compiled r12b, just a --prefix.
I'm an erlang newbie so highly likely I've written something
stupid. Just hope it's obvious whatever it is!
The scanner is quite large so I've reduced it down to two
smaller programs which show similar symptoms. The first one
just throws exceptions from time to time. The second program
ends up dying as a result of a segmentation fault sooner
of later.
The following code won't make real sense. The full scanner
makes sense but this is only a mutated 10% of that. Sorry
the code is so unintelligible - but on the bright side
it fails more frequently and predictably than the full
scanner does.
%% START OF CODE: weird.erl %%
-module(weird).
-compile(export_all).
%% For testing - runs scanner N number of times with same input
run(N) ->
lists:foreach(fun(_) ->
scan(<<"region:whatever">>, [])
end, lists:seq(1, N)).
scan(<<>>, TokAcc) ->
lists:reverse(['$thats_all_folks$' | TokAcc]);
scan(<<D, $\s, Rest/binary>>, TokAcc) when
(D =:= $D) or (D =:= $d) ->
scan(Rest, ['AND' | TokAcc]);
scan(<<D>>, TokAcc) when
(D =:= $D) or (D =:= $d) ->
scan(<<>>, ['AND' | TokAcc]);
scan(<<N, Z, Rest/binary>>, TokAcc) when
(N =:= $N) or (N =:= $n),
(Z =:= $\s) ->
scan(<<Z, Rest/binary>>, ['NOT' | TokAcc]);
scan(<<C, Rest/binary>>, TokAcc) when
(C >= $A) and (C =< $Z);
(C >= $a) and (C =< $z);
(C >= $0) and (C =< $9) ->
case Rest of
<<$:, R/binary>> ->
scan(R, [{'FIELD', C} | TokAcc]);
_ ->
scan(Rest, [{'KEYWORD', C} | TokAcc])
end.
%% END OF CODE %%
Here's what I see from the shell on an i386 machine:
1> c(weird).
{ok,weird}
2> weird:run(1000).
ok
3> weird:run(1000).
ok
4> weird:run(1000).
ok
5> weird:run(1000).
** exception error: no function clause
matching weird:scan(<<"whatever">>,
[{'FIELD',110},
{'KEYWORD',111},
{'KEYWORD',105},
{'KEYWORD',103},
{'KEYWORD',101},
{'KEYWORD',114}])
in function lists:foreach/2
6> weird:run(1000).
** exception error: no function clause
matching weird:scan(<<"whatever">>,
[{'FIELD',110},
{'KEYWORD',111},
{'KEYWORD',105},
{'KEYWORD',103},
{'KEYWORD',101},
{'KEYWORD',114}])
in function lists:foreach/2
7>
It will then keep throwing exceptions from this point on. On an
amd64 machine I'm getting similar output, but it usually has
the sequence ok, error, ok, error... And if I bump it from
1,000 up to 10,000 iterations the errors usually stop (on amd64).
The second block of code is:
%% START OF CODE: scanner.erl %%
-module(scanner).
-compile(export_all).
%% For testing - runs scanner N number of times with same input
run(N) ->
lists:foreach(fun(_) ->
scan(<<"region:whatever">>, [])
end, lists:seq(1, N)).
scan(<<>>, TokAcc) ->
lists:reverse(['$thats_all_folks$' | TokAcc]);
scan(<<D, Z, Rest/binary>>, TokAcc) when
(D =:= $D orelse D =:= $d) and
((Z =:= $\s) or (Z =:= $() or (Z =:= $))) ->
scan(<<Z, Rest/binary>>, ['AND' | TokAcc]);
scan(<<D>>, TokAcc) when
(D =:= $D) or (D =:= $d) ->
scan(<<>>, ['AND' | TokAcc]);
scan(<<N, Z, Rest/binary>>, TokAcc) when
(N =:= $N orelse N =:= $n) and
((Z =:= $\s) or (Z =:= $() or (Z =:= $))) ->
scan(<<Z, Rest/binary>>, ['NOT' | TokAcc]);
scan(<<C, Rest/binary>>, TokAcc) when
(C >= $A) and (C =< $Z);
(C >= $a) and (C =< $z);
(C >= $0) and (C =< $9) ->
case Rest of
<<$:, R/binary>> ->
scan(R, [{'FIELD', C} | TokAcc]);
_ ->
scan(Rest, [{'KEYWORD', C} | TokAcc])
end.
%% END OF CODE %%
When I use this code in the shell (on i386) is usually works okay
for a smaller number of iterations but when you get into the
hundreds it dies fast:
1> c(scanner).
{ok,scanner}
2> scanner:run(10). % Start with 10
ok
3> scanner:run(10).
ok
4> scanner:run(100). % Bumped up to 100
** exception error: no function clause
matching weird:scan(<<"whatever">>,
[{'FIELD',110},
{'KEYWORD',111},
{'KEYWORD',105},
{'KEYWORD',103},
{'KEYWORD',101},
{'KEYWORD',114}])
in function lists:foreach/2
5> scanner:run(100).
Segmentation fault
Anyone got any ideas?
Cheers,
Rory
You may want to try specifying a size for the variables D, N, and C.
For example: scan(<<C:8/integer, Rest/binary>>, TokAcc)
According to the manual: "In matching, this default value is only valid
for the very last element. All other bit string or binary elements in the
matching must have a size specification."
(otp_doc_html_R12B-1/doc/reference_manual/expressions.html#6.16)
It's possible that the lack of a size is confusing things.
Hope that helps,
Rusty
--
Rusty Klophaus (http://rklophaus.com)
I confirm your experiences. Mine are slightly different than yours, but
the end results are the same; see below. This is with the most recent
development version. I suspect a GC-related.
Kostis
PS. Surprisingly, I cannot manage to get a seg-fault if I compile
to native code. [using hipe:c() instead of c()]
========================================================================
Erlang (BEAM) emulator version 5.6.2 [source] [async-threads:0] [hipe]
[kernel-poll:false]
Eshell V5.6.2 (abort with ^G)
1> c(weird).
{ok,weird}
2> weird:run(10000).
ok
3> weird:run(10000).
ok
4> weird:run(10000).
ok
5> weird:run(10000).
ok
6> weird:run(10000).
ok
7> weird:run(10000).
ok
8> weird:run(10000).
ok
9> weird:run(10000).
ok
10> weird:run(10000).
ok
11> weird:run(10000).
ok
12> weird:run(10000).
ok
13> weird:run(10000).
ok
14> weird:run(10000).
ok
15> weird:run(10000).
ok
16> weird:run(10000).
ok
17> weird:run(10000).
ok
18> weird:run(10000).
ok
19> weird:run(10000).
ok
20> weird:run(10000).
ok
21> weird:run(10000).
ok
22> weird:run(10000).
ok
23> weird:run(10000).
ok
24> weird:run(100).
** exception error: no function clause matching weird:scan(<<"whatever">>,
[{'FIELD',110},
{'KEYWORD',111},
{'KEYWORD',105},
{'KEYWORD',103},
{'KEYWORD',101},
{'KEYWORD',114}])
in function lists:foreach/2
25> weird:run(100).
ok
26> weird:run(100).
ok
27> halt().
@statler [~/HiPE/otp] hipe
Erlang (BEAM) emulator version 5.6.2 [source] [async-threads:0] [hipe]
[kernel-poll:false]
Eshell V5.6.2 (abort with ^G)
1> c(scanner).
{ok,scanner}
2> scanner:run(100).
ok
3> scanner:run(100).
Segmentation fault
Thanks Kostis. Bit of a relief really - I wasn't exactly making a
whole lot of progess fixing my code!
>
> PS. Surprisingly, I cannot manage to get a seg-fault if I compile
> to native code. [using hipe:c() instead of c()]
>
Excellent. I just tried hipe on the amd64 machine and no seg-fault.
Oddly hipe doesn't seem to be enabled on my i386 even though
I compiled it the same way as the amd64 version.
As an added bonus, I was stearing clear of hipe because I had
read somewhere that there was problems with running it on
a xen instance (due to the threading library used if I recall).
However, the amd64 machine I just tried it on is a xen
instance, so that looks promising. So, thanks again!
Rory
Cheers Rusty, I took a shot at that, but no dice I'm afraid.
Actually, I ran into a problem on another project that led me to
this passage last week. I was trying to write something like
<<Data/binary, Pad:8>> = Payload.
but the compiler was complaing (as compilers do). What it was
trying to tell me was that a binary type must have a length
field unless it appears at the end of a <<binary>> pattern.
Sorry, when speaking about this stuff the term binary inevitably
gets overloaded. In essence, it was telling me I had to
do something like:
Length = size(Payload) - 1,
<<Data:(Length)/binary, Pad:8>> = Payload.
Something like that anyway. That's what the passage you quoted is
about - it's talking about using the binary type within a pattern.
You must specify a length with it unless it's at the end of the
pattern.
Also, the defaults for items in a pattern are size 8 and type
integer - so I think my code is safe. Truth be told, if I
had to write that stuff for each term I'd probably just convert
the thing to a list and do matching that way. Yeah, I'm that
lazy :-)
Thanks again Rusty,
Rory
> The following code won't make real sense. The full scanner
> makes sense but this is only a mutated 10% of that. Sorry
> the code is so unintelligible - but on the bright side
> it fails more frequently and predictably than the full
> scanner does.
Thanks for the bug report. I was able to reproduce the crash
by running the scanner module. I'll start investigating it.
/Bjorn
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
> The following code won't make real sense. The full scanner
> makes sense but this is only a mutated 10% of that. Sorry
> the code is so unintelligible - but on the bright side
> it fails more frequently and predictably than the full
> scanner does.
Again thanks for your bug report.
I have extended our test suites and corrected the bug. The correction will
be included in R12B-2.
Here is the correction:
*** erts/emulator/beam/beam_emu.c@@/OTP_R12B-1 Tue Feb 5 14:37:01 2008
--- erts/emulator/beam/beam_emu.c Mon Mar 3 16:21:22 2008
***************
*** 3471,3476 ****
--- 3471,3477 ----
ms = (ErlBinMatchState *) boxed_val(tmp_arg1);
dst = (ErlBinMatchState *) HTOP;
*dst = *ms;
+ *HTOP = HEADER_BIN_MATCHSTATE(slots);
HTOP += wordsneeded;
StoreResult(make_matchstate(dst), Arg(3));
Just applied the patch and everything works great now. Many thanks Bjorn!
Rory
Hello,
I'm seeing some problems with fprof on i386 (but not on amd64).
I'm not certain that the problem is related to this thread but
I think it might be since it's the same code that is effected.
Basically, fprof dies when it tries to load certain modules. It's
not just my own modules that causes this - here's what happens
when running fprof on http:request/1 on my machine:
%% -- START CODE -- %%
Erlang (BEAM) emulator version 5.6 [source] [async-threads:0]
[kernel-poll:false]
Eshell V5.6 (abort with ^G)
1> inets:start().
ok
2> fprof:apply(http, request, ["http://www.erlang.com"]).
Aborted
%% -- END -- %%
>From what I have seen, the modules that are effected can
only be profiled sucessfully if you load the module and its
dependencies before running fprof. The code that I posted
at the start of this thread (weird.erl and scanner.erl) is
effected by this problem so I'll use weird.erl in the
following examples:
%% -- START CODE -- %%
$ erl
Erlang (BEAM) emulator version 5.6 [source] [async-threads:0]
[kernel-poll:false]
Eshell V5.6 (abort with ^G)
1> l(weird).
{module,weird}
2> fprof:apply(weird, run, [1]).
ok
3> fprof:apply(weird, run, [1]).
ok
4> q().
ok
$ erl
Erlang (BEAM) emulator version 5.6 [source] [async-threads:0]
[kernel-poll:false]
Eshell V5.6 (abort with ^G)
1> fprof:apply(weird, run, [1]).
Aborted
%% -- END -- %%
Also, there is no weird:run/3, but I get the same result if I ask
fprof to call it:
%% -- START CODE -- %%
$ erl
Erlang (BEAM) emulator version 5.6 [source] [async-threads:0]
[kernel-poll:false]
Eshell V5.6 (abort with ^G)
1> fprof:apply(weird, run, [what, the, f]).
Aborted
%% -- END -- %%
Hipe is not supported on my i386 so I can't test with it.
This isn't really a problem for me as I can make warm-up
calls to my modules before profiling - probably a smart
thing to do anyway. Just thought it might be of interest.
Cheers.
Rory
> Erlang (BEAM) emulator version 5.6 [source] [async-threads:0]
> [kernel-poll:false]
>
> Eshell V5.6 (abort with ^G)
You should update to R12B-1. If I remember correctly, this was one of the
bug we fixed for the R12B-1 release.
Oops! I really thought I was using R12B-1 on this machine. I'm
all fixed now. Sorry about that.
Rory