shalabalubaosigbus
unread,Nov 2, 2009, 4:04:56 PM11/2/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mogile
I noticed there where some entries showing up in the file_on table
that had no entry in the file table.
I started digging and noticed that every once and a while (under high
load) the tracker fails when trying to write to convert a temp file
into a real one.
It performs the DELETE FROM tempfile.
It then executes the INSERT IGNORE INTO file_on
after this (or during this) it crashes and does not clean up after
itself.
This is happening in lib/MogileFS/Store.pm line 286:
my $rv = eval { $self->dbh->do($sql, @do_params) };
I think the point of this eval is to gracefully handle errors, but
this is not happening.
I put some debugging code in there to see where it was crashing and I
get the message before this line, but not after.
Here is the debug output from one instance of this happening:
[queryworker(31634)] about to call add_to_db on key: /new/
1257191780.11-6650-t1.bofh.lan fidfid: 34922
[queryworker(31634)] calling add_fidid_fo_devid from add_to_db fidid:
34922
[queryworker(31634)] About to insert into file_on in
add_fidid_to_dev_id: fidid: 34922 devid: 2
[queryworker(31634)] dowell: start INSERT IGNORE INTO file_on (fid,
devid) VALUES (?,?) 34922 2
[queryworker(31634)] dowell: inside eval before ->do: INSERT IGNORE
INTO file_on (fid, devid) VALUES (?,?) 34922 2
091102 12:56:23 mogilefs 0 654 1 20678536 0 INSERT INTO tempfile
(dmid,dkey,classid,devids,createtime) VALUES ('5','/new/
1257191780.11-6650-t1.bofh.lan','1','2,1',UNIX_TIMESTAMP());
091102 12:56:28 mogilefs 0 659 1 20713229 0 DELETE FROM tempfile WHERE
fid='34922';
091102 12:56:30 mogilefs 4 659 1 20721067 0 INSERT IGNORE INTO
file_on (fid, devid) VALUES ('34922','2');
There is nothing else logged from this pid after this.
I think this needs to:
1. be able to handle errors happening on the database. i.e. realize
there was an error and report it and clean up after itself. eval is
not cutting it in this case.
and
2. AutoCommit => 0, this would force the tracker to send a commit and
if it didn't then the transaction would be rolled back.
Is there any particular reason AutoCommit is on?