File lost after restart mysql & mogilefsd

12 views
Skip to first unread message

afool41

unread,
Jul 18, 2008, 5:09:54 AM7/18/08
to mogile
Hi :

We are new to mogilefs, and recently we ran into the same problem
described in this mail :
http://lists.danga.com/pipermail/mogilefs/2008-June/001728.html

After carefully tracing the code, we come to the conclusion that
sub register_tempfile of Store.pm
*might be* the reason for this problem. I am not a very good
programmer, so anyone who
has more experience in this , please correct me if I am wrong...

I think the problem is in the while loop of register_tempfile ,
so I modified the program in while and did 2 things :
1. when entering while loop, delete the row ( whose fid is in use in
file table ) from tempfile table .
2. after successfully inserted another row ( fid = max(fid) of file
table ) into tempfile, delete it.

This way, the only key=_server_startup_test row left in tempfile
would NOT conflict with any fid in file table and thus won't remove
the actual file from device when worker remove this row.

The following is the patch. Any correction/advice is welcomed! :

afool

#########################################################################
--- Store.pm.orig 2008-06-30 11:13:28.000000000 +0800
+++ Store.pm.new 2008-07-18 16:53:08.000000000 +0800
@@ -845,6 +845,9 @@
while ($fid_in_use->($fid)) {
throw("dup") if $explicit_fid_used;

+ ## delete the row of tempfile that has the fid in use by file
table
+ $dbh->do("DELETE from tempfile WHERE fid = '$fid'");
+
# be careful of databases which reset their
# auto-increment/sequences when the table is empty (InnoDB
# did/does this, for instance). So check if it's in use, and
@@ -853,7 +856,11 @@

# get the highest fid from the filetable and insert a dummy
row
$fid = $dbh->selectrow_array("SELECT MAX(fid) FROM file");
- $ins_tempfile->(); # don't care about its result
+ if (defined $ins_tempfile->()) ## could be undef?
+ {
+ ## dummy row needs to be deleted, or there will be
problem...
+ $dbh->do("DELETE from tempfile WHERE fid = '$fid'");
+ }

# then do a normal auto-increment
$fid = undef;

Brad Fitzpatrick

unread,
Jul 18, 2008, 2:13:48 PM7/18/08
to mog...@googlegroups.com
Just replying to one part:

On Fri, Jul 18, 2008 at 2:09 AM, afool41 <afo...@gmail.com> wrote:

+        ## delete the row of tempfile that has the fid in use by file
table
+        $dbh->do("DELETE from tempfile WHERE fid = '$fid'");
+

Never interpolate a variable into an SQL query like that!

Use placeholders.   " ..... WHERE fid=?", undef, $fid


+           $dbh->do("DELETE from tempfile WHERE fid = '$fid'");
+        }

Same.

 

afool41

unread,
Jul 18, 2008, 11:12:32 PM7/18/08
to mogile
Thanks a lot for your correction...

Beside the programming skill, is there any logical problem ?

afool

On 7月19日, 上午2時13分, "Brad Fitzpatrick" <b...@danga.com> wrote:
> Just replying to one part:
>

Brad Fitzpatrick

unread,
Jul 22, 2008, 1:41:50 PM7/22/08
to mog...@googlegroups.com
I haven't had any chance to look into it.  Alan?

2008/7/18 afool41 <afo...@gmail.com>:

dormando

unread,
Jul 22, 2008, 8:28:26 PM7/22/08
to mog...@googlegroups.com
haven't had a chance to look yet; sounds interesting.

On Tue, 22 Jul 2008, Brad Fitzpatrick wrote:

> I haven't had any chance to look into it. Alan?
>
> 2008/7/18 afool41 <afo...@gmail.com>:
>
>>
>> Thanks a lot for your correction...
>>
>> Beside the programming skill, is there any logical problem ?
>>
>> afool
>>

Chaos Wang

unread,
Jul 22, 2008, 10:33:54 PM7/22/08
to mog...@googlegroups.com
Agree with the problem analysis, the current temp-file register algorithm will cause a not-so-busy MogileFS deployment to lose file. The main reason is that the handling process of duplicating fid left some already used fid in tempfile table, causing the files with the same fids to be removed by Delete worker.

The original duplicating fid handling process is (register_tempfile() in Store.pm):
...
while($fid_in_use->($fid)) {

    throw("dup") if $explicit_fid_used;
    $fid=$dbh->selectrow_array("SELECT MAX(fid) FROM file");
    $ins_tempfile->();
    $fid=undef;
    $ins_tempfile->() or die "register_tempfile failed after seeding";
}
...
For a fresh MogileFS deployment using MySQL as meta storage, assume there is only 1 file "a.jpg" uploading, then the contents of tempfile table could be:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
| 1 | ... | 99 | _server_startup_test | |
| 2 | ... | 1 | a.jpg | 1,2 |
( the fid auto-inc counter = 2 )
and when file "a.jpg" is saved completely, the corresponding tempfile row will be removed:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
| 1 | ... | 99 | _server_startup_test | |
( the fid auto-inc counter = 2 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |

<<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
Assume the system is idle for a long time (the Delete worker purging interval, default to 3600s), then the Delete worker starts to purge anything old enough in tempfile table, left the tempfile table empty:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
( the fid auto-inc counter = 2 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |

<<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
After that, assume MySQL is restarted for some reason, now the fid auto-inc counter is reset to 1 (that's how InnoDB handles auto-inc fields), then another file "b.jpg" uploading happens:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
| 1 | ... | 1 | b.jpg | 1,2 |
( the fid auto-inc counter = 2 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |

<<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
when b.jpg uploaded, the table contents will be:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
( the fid auto-inc counter = 2 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |
| 1 | 1 | b.jpg | ... |

<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
| 1 | 1 |
| 1 | 2 |
Now assume file "c.jpg" uploading begins, duplicating fid will be generated in tempfile which activates the above code snippet. As a result the table contents will be:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
| 2 | ... | 1 | c.jpg | 1,2 | ( the original tempfile inserted this)
| 3 | ... | 1 | c.jpg | 1,2 | ( the first $ins_tempfile call in the above code failed, the second $ins_tempfile inserted this)
( the fid auto-inc counter = 4 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |
| 1 | 1 | b.jpg | ... |

<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
| 1 | 1 |
| 1 | 2 |
When c.jpg is uploaded completely, the table contents will be:
<<tempfile>>
| fid | ... | dmid | dkey | devids |
| 2 | ... | 1 | c.jpg | 1,2 | <-- dangling record here!
( the fid auto-inc counter = 4 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |
| 1 | 1 | b.jpg | ... |
| 3 | 1 | c.jpg | ... |

<file_on>>
| fid | devid |
| 2 | 1 |
| 2 | 2 |
| 1 | 1 |
| 1 | 2 |
| 3 | 1 |
| 3 | 2 |
Then, when the next purging time comes, the Delete worker will remove the dangling c.jpg record in tempfile, along with the data of a.jpg in table file_on!
<<tempfile>>
| fid | ... | dmid | dkey | devids |
( the fid auto-inc counter = 4 )

<<file>>
| fid | dmid | dkey | ... |
| 2 | 1 | a.jpg | ... |
| 1 | 1 | b.jpg | ... |
| 3 | 1 | c.jpg | ... |

<file_on>>
| fid | devid |
| 1 | 1 |
| 1 | 2 |
| 3 | 1 |
| 3 | 2 |
Then the contents of file a.jpg is lost...

I think the solution given by afool41 is suitable, after replacing the variable interpolating with binding. Here is the slightly modified patch:
diff -up ./Store.pm.old ./Store.pm
--- ./Store.pm.old    2008-07-23 10:27:19.000000000 +0800
+++ ./Store.pm    2008-07-23 10:29:06.000000000 +0800
@@ -851,9 +851,13 @@ sub register_tempfile {
         # re-seed the table with the highest known fid from the file
         # table.
 
+        $dbh->do("DELETE FROM tempfile WHERE fid=?",undef,$fid);
+

         # get the highest fid from the filetable and insert a dummy row
         $fid = $dbh->selectrow_array("SELECT MAX(fid) FROM file");
-        $ins_tempfile->();  # don't care about its result
+        if(defined $ins_tempfile->()) {
+            $dbh->do("DELETE FROM tempfile WHERE fid=?",undef,$fid);

+        }
 
         # then do a normal auto-increment
         $fid = undef;

afool41

unread,
Jul 23, 2008, 5:36:45 AM7/23/08
to mogile
Wang :

Thanks for your detailed analysis. It certainly explains
better...

afool41
> >> 2008/7/18 afool41 <afoo...@gmail.com>:

dormando

unread,
Jul 27, 2008, 7:08:11 PM7/27/08
to mog...@googlegroups.com
Yo,

Reviewing this... I think the patch is probably good, but it doesn't feel
like a strong enough fix for me. It might be possible for multiple
tempfile rows to end up in this gummed state, and this patch will only fix
the latest one.

The *real* problem is Worker/Delete.pm:process_tempfiles() doesn't check
for 'file' rows before nuking entries.

sub process_tempfiles does:

- Store.pm:old_tempfiles
- Mass-insert the devids into file_on (Which already exist in this case!)
- Mass-insert the rows into file_to_delete so they are deleted as through
any other method.

... then the deleter goes by and deletes the files + file_on rows.

Note:

- Delete worker process doesn't actually remove the 'file' rows!
- Store.pm:delete_fidid (called from FID.pm:delete) removes the 'file'
row, any tempfile row, and inserts into file_to_delete for the removal.

So the actual fix is looking like, within Worker/Delete.pm:
- Call Store.pm:old_tempfiles
- Try to instantiate a FID object for each tempfile row.
- If FID is instantiated, delete tempfile row and loop.
- If FID object is undefined, go ahead and delete.

I've verified that the 'file' entry isn't created until cmd_create_close
is executed successfully, so this should be fine?

Any thoughts? I'll whip up a patch when I get a chance.
-Dormando

afool41

unread,
Jul 31, 2008, 2:18:25 AM7/31/08
to mogile
dormando :

thanks for your reply...

Although I don't think I fully understand what you mean by
"It might be possible for multiple tempfile rows to end up in this
gummed state, and this patch will only fix the latest one."
, but I do agree that the real problem is on Worker/Delete.pm . The
reason I only tried to fix the tempfile problem
is because it's easier for me ;-)

Looking forward to your patch...

afool41
> >>> 2008/7/18 afool41 <afoo...@gmail.com>:

dormando

unread,
Aug 11, 2008, 12:58:34 AM8/11/08
to mog...@googlegroups.com
Replying to my own thread. See:
[PATCH] Don't delete files accidentally during tempfile table cleanup.

... and try it out? It should independently fix the issue. I also have no
complaints about the submitted patch and will commit that too once I've
given people a day to read this :) Tested this patch. WFM.

-Dormando

Andy Lo-A-Foe

unread,
Sep 5, 2008, 9:53:04 AM9/5/08
to mogile
Hi,

We just experienced data loss with MogileFS 2.20. By analyzing the
MySQL bin logs it seems to be related to what is discussed in this
thread. The file dkey is inserted into tempfile but at some point the
data simply disappears from the cluster i.e. no entry in file table.
The MySQL bin data did not provide fid numbers so it's a bit hard to
trace the statements exactly. I've applied the patch to Store.pm.
Really hope this helps as we trust MogileFS to not lose data. BTW, our
MySQL server hasn't been restarted for weeks..

Regards,
Andy
> >>>> 2008/7/18 afool41 <afoo...@gmail.com>:

dormando

unread,
Sep 5, 2008, 2:55:48 PM9/5/08
to mogile
Hey,

Are you sure you successfully upgraded to mogilefs 2.20? That release
includes the patch you said you applied :)

And yes, I made that second fix since I was sure there were other
conditions where tempfile can have danglers, but didn't want to trust on
my ability to find all of them. It's still rare, if it happens at all.

-Dormando

Andy Lo A Foe

unread,
Sep 7, 2008, 1:13:20 PM9/7/08
to mog...@googlegroups.com
Hi,

I upgraded to 2.20 specifically for the following patch:

http://code.sixapart.com/trac/mogilefs/changeset/1203

The patch I installed additionally is this one:

--- ./Store.pm.old 2008-07-23 10:27:19.000000000 +0800
+++ ./Store.pm 2008-07-23 10:29:06.000000000 +0800
@@ -851,9 +851,13 @@ sub register_tempfile {
# re-seed the table with the highest known fid from the file
# table.

+ $dbh->do("DELETE FROM tempfile WHERE fid=?",undef,$fid);
+
# get the highest fid from the filetable and insert a dummy row
$fid = $dbh->selectrow_array("SELECT MAX(fid) FROM file");
- $ins_tempfile->(); # don't care about its result
+ if(defined $ins_tempfile->()) {
+ $dbh->do("DELETE FROM tempfile WHERE fid=?",undef,$fid);
+ }

# then do a normal auto-increment
$fid = undef;

..which is not in the 2.20 tag. File loss happens when we're doing
lots of concurrent inserts into our cluster. I hope the above patch
helps. Any more details how/why this is triggered?

Thanks,
Andy

dormando

unread,
Sep 7, 2008, 2:21:23 PM9/7/08
to mog...@googlegroups.com
Hey,

That patch isn't necessary given what my change was.

Can you confirm a few things, please:

- That my patch to the delete worker exists on your server correctly.
- That your fids that are disappearing have a *file* row but no *file_on*
rows?

If your `file` and all file_on rows disappear, it's either your
application removing the file or a different bug. All of the bugs with
tempfile should leave an orphaned `file` row in the database, since none
of the delete worker code actually touches the file table.

Thanks,
-Dormando

Reply all
Reply to author
Forward
0 new messages