The log message indicates that no OSDs could be assigned to the new
replica, which was supposed to be created when the file was closed after
having been written. Since your OSDs seem to be alive and have enough
free disk space, I suspect that the problem has to do with the selection
of OSDs.
Did you change the default OSD selection policy, so that it may restrict
the assignment of OSDs to replicas? If not, did you change the default
striping parameters, such as the number of OSDs used per file? (The
latter case may cause problems, because both OSDs might have already
been assigned to the first, original replica of the file, and different
replicas of a file mustn't share any OSDs.)
I'm not sure where this comes from; we will check this asap.
Best regards,
Jan
Best regards,
Jan
Best regards,
Jan
> ok, Thank you .
>
> But I did not find that any osd crashed or was shut down when this
> error had occurred.
> According to the log, Can I think that this error does not happen to
> every file.
> I have tried to find the file "68" at the object_dir on the osd
> "ecf8c488-3891-48b3-8c19-a1530dd8f7e0" machine, but I did not find
> it.
>
> How can I find this file
There is no XtreemFS tool that returns path names for a file ID. You
could do something like
find <mountpoint> -print -exec xtfs_stat {} \; | grep -B 3
ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68
to search a mounted volume for the file, or you could create an XML dump
of the entire MRC database with 'xtfs_mrcdbtool' and search the dump
afterwards.
Hope this helps and best regards,
Jan
It looks like the MRC is trying to trigger the on-close replication
again, even though the file has been replicated before and hasn't been
opened for writing. However, this attempt fails, because there are no
more OSDs available that haven't been already assigned to one of the
other replicas.
Are you currently using XtreemFS 1.1? As far as I remember, there was a
similar bug in the release. It has been already fixed in the trunk,
though. As long as you only have two OSDs in the system, you can
probably ignore the problem; otherwise, I suggest to build and use the
trunk version until the next release appears.
Best regards,
Jan