Fortunately, I've now managed to get set up with a two-machine debugging mechanism, and I know where the bug is occurring (in this piece of code); the 'xvp' is null, which is causing a null dereference and thus the panic inducing headache.
/* Lookup or create the named attribute. */
if ( zfs_obtain_xattr(VTOZ(xdvp), ZFS_MTIME_XATTR,
S_IRUSR | S_IWUSR, cr, &xvp,
flag) ) {
zfsvfs->z_last_unmount_time = 0;
zfsvfs->z_last_mtime_synced = 0;
vnode_put(xdvp);
goto out;
}
gethrestime(&now);
ZFS_TIME_ENCODE(&now, VTOZ(xvp)->z_phys->zp_mtime);
I don't really want to commit (and therefore push) with the known bad code in place; unfortunately, I ran out of time this weekend in order to figure out what's going on more. As far as I can tell, this bit hasn't changed - though I may have changed the xattr code - since the original codebase, so I'm not entirely sure what's happening here.
Anyway, I want to get this tested and free from these kind of kernel issues prior to committing the succesful merge which is why I've not done it yet.
Alex
I generally recommend committing all of the time. You can easily
make lots of small commits into one big commit before publishing.
Doing this in a branch makes it easier to tag out and get some help
(and lots of small changes are useful there as well).
I'm still a bit slammed with work, but hopefully I could squeeze a
little bit of time in before too long. Your work is too important to
not be committing and pushing up a branch, though. :)
> little bit of time in before too long. Your work is too important to
> not be committing and pushing up a branch, though. :)
Indeed. Thank you Alex, for all your hard work and perseverance!
--sambo
Agreed, generally I do that. (It's made much easier now that I can rebase on a ZFS partition using the 'ignorectime' mentioned earlier!) However, in this case, the error is (obviously) an artefact of the merge itself; so I may have mis #ifdef'd something or equivalent. I have done the merge, but not yet committed a merge node; but if I had, and subsequently tweaked bits, I don't know if it's possible to squash down to the commit merge node or not.
Anyway, I'm keeping backups on three machines as I go and I hope to have found the culprit in the near future. Fortunately, now I've figured out how to do the kernel debugging, I can step through to find out why the issue is occurring, given that I now know where.
Alex