osd error

55 views
Skip to first unread message

fanbreeze

unread,
Oct 28, 2009, 11:48:01 PM10/28/09
to XtreemFS
hi,

I created four volumes in one xtreemfs system.
one volume was set direct_io ,ocr_factor_set 2 and ocr_full_set true.
one volume was set direct_io.
two volume was not set any.

I found the following error in osd.log

what caused this error?


thank you

-----------------------------------------
[ E | StageRequest | OSD Replication Stage | 17 | 0:04:41]
internal server error in internal event: java.lang.Runt
imeException: padding object must not be last object!
[ E | StageRequest | OSD Replication Stage | 17 | 0:04:41]
java.lang.RuntimeException: padding object must not be
last object!
...
org.xtreemfs.osd.storage.ObjectInformation.getObjectData
(ObjectInformation.jav
a:98)
...
org.xtreemfs.osd.operations.ReadOperation.readFinish
(ReadOperation.java:206)
...
org.xtreemfs.osd.operations.ReadOperation.postReadReplica
(ReadOperation.java:2
85)
...
org.xtreemfs.osd.operations.ReadOperation$3.fetchComplete
(ReadOperation.java:2
40)
...
org.xtreemfs.osd.replication.ReplicatingFile
$ReplicatingObject.sendResponses(R
eplicatingFile.java:259)
...
org.xtreemfs.osd.replication.ReplicatingFile
$ReplicatingObject.objectNotFetche
d(ReplicatingFile.java:214)
...
org.xtreemfs.osd.replication.ReplicatingFile.objectNotFetched
(ReplicatingFile.
java:539)
...
org.xtreemfs.osd.replication.ObjectDissemination.objectNotFetched
(ObjectDissem
ination.java:152)
...
org.xtreemfs.osd.stages.ReplicationStage.processInternalObjectFetched
(Replicat
ionStage.java:181)
...
org.xtreemfs.osd.stages.ReplicationStage.processMethod
(ReplicationStage.java:1
16)
...
org.xtreemfs.osd.stages.Stage.run(Stage.java:101)

fanbreeze

unread,
Oct 29, 2009, 12:53:15 AM10/29/09
to XtreemFS
I have two osd nodes.

mrc.log
------------------
[ W | OSDStatusManager | ProcSt | 11 | 0:51:14] all
OSDs: de06eb14-21cd-450b-beb1-2672767233c3, {total=204178
042880, free=71430942720, status_page_url=http://127.0.0.1:30640,
vivaldi_coordinates=000000000000000000000000000000000000000
000000000, totalRAM=1034027008, usedRAM=7777336, load=3,
geoCoordinates=, seconds_since_last_update=4, proto_version=20090829
18} edf4d388-0a4f-4f4a-b8db-841bae903f62, {total=10158686208,
free=8437846016, status_page_url=http://127.0.0.1:30640, vivald
i_coordinates=000000000000000000000000000000000000000000000000,
totalRAM=512950272, usedRAM=3398472, load=11, geoCoordinates=
, seconds_since_last_update=2, proto_version=2009082918}
[ W | - | ProcSt | 11 | 0:51:14] no
suitable OSDs available for file ecf8c488-3891-48b3-8c19-a
1530dd8f7e0:68
[ W | CloseOperation | ProcSt | 11 | 0:51:14] could
not replicate file '68' on close
[ W | CloseOperation | ProcSt | 11 | 0:51:14]
org.xtreemfs.mrc.UserException: could not assign OSDs to file
ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68: no feasible OSDs available
(errno=5)
at org.xtreemfs.mrc.utils.MRCHelper.createReplica
(MRCHelper.java:196)
at org.xtreemfs.mrc.operations.CloseOperation.startRequest
(CloseOperation.java:126)
at org.xtreemfs.mrc.stages.ProcessingStage.parseAndExecute
(ProcessingStage.java:276)
at org.xtreemfs.mrc.stages.ProcessingStage.processMethod
(ProcessingStage.java:203)
at org.xtreemfs.mrc.stages.MRCStage.run(MRCStage.java:138)

[ W | OnCloseReplicationThread | OnCloseReplThr | 16 | 0:51:23]
java.io.IOException: request timed out
at org.xtreemfs.foundation.oncrpc.client.RPCResponse.get
(RPCResponse.java:77)
at org.xtreemfs.mrc.stages.OnCloseReplicationThread.run
(OnCloseReplicationThread.java:115)


---------------------------------------------------------

Jan Stender

unread,
Oct 29, 2009, 1:25:19 PM10/29/09
to xtre...@googlegroups.com
Hi,

The log message indicates that no OSDs could be assigned to the new
replica, which was supposed to be created when the file was closed after
having been written. Since your OSDs seem to be alive and have enough
free disk space, I suspect that the problem has to do with the selection
of OSDs.

Did you change the default OSD selection policy, so that it may restrict
the assignment of OSDs to replicas? If not, did you change the default
striping parameters, such as the number of OSDs used per file? (The
latter case may cause problems, because both OSDs might have already
been assigned to the first, original replica of the file, and different
replicas of a file mustn't share any OSDs.)

I'm not sure where this comes from; we will check this asap.

Best regards,
Jan

fanbreeze

unread,
Oct 30, 2009, 3:12:29 AM10/30/09
to XtreemFS
I did not change the default OSD selection policy and the default
striping parameters.

To my surprise. when I ran " xtfs_stat testfile", the result was:
-------------
[okooo@192 1029]$ xtfs_stat testfile
filename testfile
XtreemFS URI oncrpc://dir57:32638/data2_Buy/P3/2009/1029/testfile
XtreemFS fileID ecf8c488-3891-48b3-8c19-a1530dd8f7e0:1171
object type regular file
owner okooo
group okooo
read-only true

XtreemFS replica list
list version 1
replica update policy ronly
-----------------------------
replica 1 SP STRIPING_POLICY_RAID0, 128kb, 1
replica 1 OSDs [{address=dir57:32640,
uuid=de06eb14-21cd-450b-beb1-2672767233c3}]
replica 1 repl. flags 0x1
-----------------------------
replica 2 SP STRIPING_POLICY_RAID0, 128kb, 1
replica 2 OSDs [{address=192.168.8.169:32640,
uuid=edf4d388-0a4f-4f4a-b8db-841bae903f62}]
replica 2 repl. flags 0xA
-----------------------------

testfile had been replicated.

when I stopped one of osd-servers, I can read it rightly

Jan Stender

unread,
Oct 30, 2009, 5:26:08 AM10/30/09
to xtre...@googlegroups.com
Ok... so it works in the general case. The failed replication request in
the log was for a different file:
'ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68'. Please let me know if you see
this problem again.

Best regards,
Jan

Jan Stender

unread,
Oct 30, 2009, 8:48:50 AM10/30/09
to xtre...@googlegroups.com
I finally found the reason for the problem and filed bug report (#78).
In general, the error seems to occur when trying to access a replica
that hasn't yet been filled with content, and no content can be copied
from the original (complete) replica (e.g. because the OSD crashed or
was shut down). IMO, the best possible behavior in this case is to
return an I/O error. We will fix this asap.

Best regards,
Jan


fanbreeze

unread,
Nov 1, 2009, 8:39:23 PM11/1/09
to XtreemFS
ok, Thank you .

But I did not find that any osd crashed or was shut down when this
error had occurred.
According to the log, Can I think that this error does not happen to
every file.
I have tried to find the file "68" at the object_dir on the osd
"ecf8c488-3891-48b3-8c19-a1530dd8f7e0" machine, but I did not find
it.

How can I find this file


Thank you

Jan Stender

unread,
Nov 2, 2009, 9:13:43 AM11/2/09
to xtre...@googlegroups.com
Hi,

> ok, Thank you .
>
> But I did not find that any osd crashed or was shut down when this
> error had occurred.
> According to the log, Can I think that this error does not happen to
> every file.
> I have tried to find the file "68" at the object_dir on the osd
> "ecf8c488-3891-48b3-8c19-a1530dd8f7e0" machine, but I did not find
> it.
>
> How can I find this file

There is no XtreemFS tool that returns path names for a file ID. You
could do something like

find <mountpoint> -print -exec xtfs_stat {} \; | grep -B 3
ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68

to search a mounted volume for the file, or you could create an XML dump
of the entire MRC database with 'xtfs_mrcdbtool' and search the dump
afterwards.

Hope this helps and best regards,
Jan

fanbreeze

unread,
Nov 3, 2009, 4:41:49 AM11/3/09
to XtreemFS
thank you very very much for your help !!!
I foud the file:
-----------------------------
[okooo@192 data2]$ find Buy/ -print -exec xtfs_stat {} \; | grep -B 3
ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68
Buy/NineToTo/200910/a
filename a
XtreemFS URI oncrpc://dir57:32638/data2_Buy/NineToTo/200910/a
XtreemFS fileID ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68

[okooo@192 data2]$ xtfs_stat Buy/NineToTo/200910/a
filename a
XtreemFS URI oncrpc://dir57:32638/data2_Buy/NineToTo/200910/a
XtreemFS fileID ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68
object type regular file
owner root
group root
read-only true

XtreemFS replica list
list version 1
replica update policy ronly
-----------------------------
replica 1 SP STRIPING_POLICY_RAID0, 128kb, 1
replica 1 OSDs [{address=192.168.8.169:32640,
uuid=edf4d388-0a4f-4f4a-b8db-841bae903f62}]
replica 1 repl. flags 0x1
-----------------------------
replica 2 SP STRIPING_POLICY_RAID0, 128kb, 1
replica 2 OSDs [{address=dir57:32640,
uuid=de06eb14-21cd-450b-beb1-2672767233c3}]
replica 2 repl. flags 0xA
-----------------------------

[okooo@192 data2]$ more Buy/NineToTo/200910/a
130*1*0*31*1*0
13*310*33*01**
13*310*31*01**
**03*0*331*130
13*310****0130
130*1*0*31*1*0
13*310*33*01**
130*100****130
**03*0*331*130
13*310****0130
130*1*0*31*1*0
13*310*33*01**
13*3*0*33*0*30
**03*0*331*130
13*310****0130
130*1*0*31*1*0
13*310*33*01**
130*310*33**1*
**03*0*331*130
13*310****0130
130*1*0*31*1*0
13*310*33*01**
*30*10*33*01*0
**03*0*331*130
13*310****0130
130*1*0*31*1*0
13*310*33*01**
1303100*3***0*
-----------------------------
when I stopped either osd, I I can read it rightly .

when I ran "more Buy/NineToTo/200910/a ", the mrc log showed:
-------------------------
[ W | OSDStatusManager | ProcSt | 11 | 128:41:58] all
OSDs: de06eb14-21cd-450b-beb1-2672767233c3, {total=204178042880,
free=66033905664, status_page_url=http://127.0.0.1:30640,
vivaldi_coordinates=000000000000000000000000000000000000000000000000,
totalRAM=1034027008, usedRAM=3389232, load=1, geoCoordinates=,
seconds_since_last_update=19, proto_version=2009082918}
edf4d388-0a4f-4f4a-b8db-841bae903f62, {total=10158686208,
free=8292704256, status_page_url=http://127.0.0.1:30640,
vivaldi_coordinates=000000000000000000000000000000000000000000000000,
totalRAM=512950272, usedRAM=1669176, load=1, geoCoordinates=,
seconds_since_last_update=1, proto_version=2009082918}
[ W | - | ProcSt | 11 | 128:41:58] no
suitable OSDs available for file ecf8c488-3891-48b3-8c19-
a1530dd8f7e0:68
[ W | CloseOperation | ProcSt | 11 | 128:41:58] could
not replicate file '68' on close
[ W | CloseOperation | ProcSt | 11 | 128:41:58]
org.xtreemfs.mrc.UserException: could not assign OSDs to file
ecf8c488-3891-48b3-8c19-a1530dd8f7e0:68: no feasible OSDs available
(errno=5)
at org.xtreemfs.mrc.utils.MRCHelper.createReplica
(MRCHelper.java:196)
at org.xtreemfs.mrc.operations.CloseOperation.startRequest
(CloseOperation.java:126)
at org.xtreemfs.mrc.stages.ProcessingStage.parseAndExecute
(ProcessingStage.java:276)
at org.xtreemfs.mrc.stages.ProcessingStage.processMethod
(ProcessingStage.java:203)
at org.xtreemfs.mrc.stages.MRCStage.run(MRCStage.java:138)
--------------------------

Can I ignore this error, Will it result other errors


Thank you !!!

Jan Stender

unread,
Nov 3, 2009, 8:20:51 AM11/3/09
to xtre...@googlegroups.com

It looks like the MRC is trying to trigger the on-close replication
again, even though the file has been replicated before and hasn't been
opened for writing. However, this attempt fails, because there are no
more OSDs available that haven't been already assigned to one of the
other replicas.

Are you currently using XtreemFS 1.1? As far as I remember, there was a
similar bug in the release. It has been already fixed in the trunk,
though. As long as you only have two OSDs in the system, you can
probably ignore the problem; otherwise, I suggest to build and use the
trunk version until the next release appears.

Best regards,
Jan

fanbreeze

unread,
Nov 4, 2009, 4:18:36 AM11/4/09
to XtreemFS
I am using XtreemFS 1.1, I have three OSDs
Reply all
Reply to author
Forward
0 new messages