Mnesia(node@localhost): ** ERROR ** (could not write core file:
system_limit)
** FATAL ** Cannot open log file "/var/mnesia/invoice_item.DCL":
{file_error, "/var/mnesia/invoice_item.DCL", system_limit}
I naively tried mnesia:start() only to have the same error repeat but
for a different table.
Restarting the whole node it now seems ok (error hasn't popped up again)
but...
Problem is that it seems data is now missing from at least one of those
tables. I have made a backup of the whole mnesia after the second
failure, but I don't suppose that data in there somewhere.
1. What has caused this? There is plenty of space left on the device,
process was running as root, ulimit reports unlimited.
2. Is the data really gone? I have enough information elsewhere to
reconstruct the data, but would like to know where it has gone.
3. How to prevent this from happening again? Periodic backups would not
help with this as I can not afford to loose data since last backup even
if I was to do 1hr intervals. Would running another node on another
system be assurance enough?
I find it very strange that some data would get dropped (about 10k
records out of 85k record table), is it a sign of a mnesia bug or is
this something I should anticipate and work around?
Some potentialy usefull System Info:
- Erlang R12B4
- about 200 tables, 50% sets 50% bags, both ram and disc copies for each
table
- node takes about 1.2GB ram when data is loaded (I understand there is
2GB per table limit, or am I misguided)
- du -sh /var/mnesia/
343M /var/mnesia/
- Filesystem Size Used Avail Use% Mounted on
/dev/md/1 233G 49G 184G 22% /
- Slackware Linux 2.6.24.5-smp #2 SMP Wed Apr 30 13:41:38 CDT 2008 i686
Intel(R) Pentium(R) Dual CPU E2200 @ 2.20GHz GenuineIntel GNU/Linux
- node did not generate erl_crash.dump as I brought it down via shell
q(). The only trace I have is the error above and subsequent error
reports of calls to mnesia failing {aborted, {node_not_running...
Any help and pointers are highly appreciated.
Thanks!
Slobo
________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org
V/
How would I check the FD usage of a running system? Is there a way to get notified if node is approaching the limit, in which case I could take some corrective action (such as restarting the node).
Do open FDs get closed if a process dies? Perhaps that's where the leak is coming from.
Biggest question that remains is why did mnesia lose data. I would understand losing whatever was going to be written when system_limit was reached, but I lost a lot of "old" data as well, data that was there for days, even weeks
Thanks
Slobo
a link that could be useful to investigate possible system limits occurring:
http://www.erlang.org/doc/efficiency_guide/advanced.html#id2265856
Then, you might want to use (but you probably did already):
* erlang:system_info()
* erlang:memory()
* length(processes())
* erlang:process_info/2
At least, this is what I would do...
Regards,
--
Roberto Aloi
robert...@erlang-consulting.com
http://www.erlang-consulting.com
> Is there a way to get notified if node is approaching the limit, in which case I could take some corrective action (such as restarting the node).
>
I'll leave that as an exercise for the reader ;) (Although an erlang
process which periodically monitors it seems like it would be simple
enough to implement).
> Do open FDs get closed if a process dies? Perhaps that's where the leak is coming from.
>
In general, yes they do. However be aware that many operations in
Erlang spawn a process, and they're not always cleaned up in the way you
might expect (we found a particular case with long-lasting HTTP client
connections where you have to explicitly shut them down, even if the
process that initiated it had crashed). Basically, what you need to be
concerned about is process leaks, not FD leaks per se.
Cheers,
Bernard
Heh, thinking outside of the (VM) box - guess there is no query-able
interface in Erlang for this? Linux is then indeed simple enough:
{ok, FDList} = file:list_dir("/proc/self/fd"),
length(FDList)
What about other platforms, ie. Windows?
> The command "lsof -p
> <PID>" will also show you (although it lists a bunch of other things too
> and I can't remember off the top of my head how to filter just to file
> descriptors that contribute to the processes limit).
>
Would lsof (or similar mechanism) be preferable as I would get a list of
open sockets and network connections which as I understand all
contribute to the max open ports limit. Would I get all those in
the /proc/.../fd list as well?
> > Do open FDs get closed if a process dies? Perhaps that's where the leak is coming from.
> >
> In general, yes they do. However be aware that many operations in
> Erlang spawn a process, and they're not always cleaned up in the way you
> might expect (we found a particular case with long-lasting HTTP client
> connections where you have to explicitly shut them down, even if the
> process that initiated it had crashed). Basically, what you need to be
> concerned about is process leaks, not FD leaks per se.
Hm, I have embedded Yaws running, and occasionally processes terminate
when invalid requests comes in. I would have thought those would have
been really dead. I'll have to keep an eye on the running system to see
if number of processes is rising.
Thanks,
Slobo
Cheers,
B
I have the time the process started and the time of the last event in my State record, and then have logic in the handle_info(timeout,State) function to cause the process to "self destruct" if it thinks it's been around too long.
There is a small performance overhead of starting a timer in your process, but it does trap these sort of problems.
Regards
Matt
________________________________________
From: erlang-q...@erlang.org [erlang-q...@erlang.org] On Behalf Of Bernard Duggan [ber...@m5net.com]
Sent: Wednesday, December 09, 2009 6:00 PM
To: Slobodan Miskovic
Cc: Erlang-Questions Questions
Subject: Re: [erlang-questions] Mnesia could not write core file: system_limit