We had one storage server that has suffered a hardware failure and is
no longer available to the system. We need to replace it. What happens
to files that were on that box, and when we put in a new box does it
automatically honor the minimum replication count and start
re-replicating those files from the box that died?
--
==============================
Dennis McEntire - gmail account
dmce...@gmail.com
==============================
currently, we don't have a tool that moves files away from an OSD if you
want to remove one. We plan to include such a tool in the next release.
We'll also enhance the scrubber to replace replicas.
>
> We had one storage server that has suffered a hardware failure and is
> no longer available to the system. We need to replace it. What happens
> to files that were on that box, and when we put in a new box does it
> automatically honor the minimum replication count and start
> re-replicating those files from the box that died?
>
As long as you have replicas on another OSD the client will
automatically use the remaining OSDs when reading the file.
Bj�rn
So at this point we are completely dependent on the storage hardware
NOT failing?
We were hoping that there would be some recovery procedure when a
storage node fails and must be replaced. Also, what would happen if
the replacement storage node has the same name (uuid) and IP address?
Is that not a good idea?
The scrubber tool you are talking about to replace lost replicas, is
that something that is in the works or just being talked about?
Thank you for the response,
Dennis
2009/12/21 Björn Kolbeck <kol...@zib.de>:
> Björn
yes, in the sense that you'll have dead replicas in the replica list for
the files until the scrubber is ready.
>
> We were hoping that there would be some recovery procedure when a
> storage node fails and must be replaced. Also, what would happen if
> the replacement storage node has the same name (uuid) and IP address?
> Is that not a good idea?
The uuid must be unique for each OSD. When you replace an OSD you
basically remove the old one and add a new one. Both must have different
uuids but you can safely re-use the IP-address for the new OSD.
>
> The scrubber tool you are talking about to replace lost replicas, is
> that something that is in the works or just being talked about?
The scrubber needs to be made replica-aware. I am working on that at the
moment. I hope to have that finished after the Christmas break in early
January.
>
> Thank you for the response,
>
> Dennis
>
>
>
> 2009/12/21 Bj�rn Kolbeck<kol...@zib.de>:
>> On 18.12.2009 02:36, Dennis McEntire wrote:
>>>
>>> What is the procedure for removing an OSD from the system?
>>
>> currently, we don't have a tool that moves files away from an OSD if you
>> want to remove one. We plan to include such a tool in the next release.
>> We'll also enhance the scrubber to replace replicas.
>>
>>>
>>> We had one storage server that has suffered a hardware failure and is
>>> no longer available to the system. We need to replace it. What happens
>>> to files that were on that box, and when we put in a new box does it
>>> automatically honor the minimum replication count and start
>>> re-replicating those files from the box that died?
>>>
>>
>> As long as you have replicas on another OSD the client will automatically
>> use the remaining OSDs when reading the file.
>>
>>
>> Bj�rn
>>
>>
>>
>>
>
>
>
In the next couple of days we will package release 1.2.1 which includes
the fixes. There is no change in the protocol, 1.2.0 clients should work
with the new 1.2.1 servers.
Bj�rn
On 13.01.2010 19:33, Dennis McEntire wrote:
> I am also wondering about the database replication, I know you added
> the database dumps in the background option in the current release,
> but there was some communication about having that as a live service
> in a future release.
>
> Is that still in the works, or do you anticipate that will be live
> sometime soon?
Felix already implemented the replication for BabuDB which is the
storage backend for the DIR and MRC. This is included in the 1.2 release
for those who want to experiment. Our plan is to thoroughly test the DIR
replication and to add DIR failover in the client (and other components)
for the next release (1.3). The MRC would be the next service to be
replicated. In parallel, we work on full read/write replication for files.
We will discuss our roadmap in the next couple of days and then update
our website.
>
> Thanks,
>
> Dennis
>
>
> 2010/1/13 Bj�rn Kolbeck<kol...@zib.de>:
>> We fixed the scrubber to check and replace replicas. There is a new tool
>> called xtfs_chstatus which can be used to mark an OSD as dead. The scrubber
>> will then remove replicas on dead OSDs and create new replicas.
>>
>> In the next couple of days we will package release 1.2.1 which includes the
>> fixes. There is no change in the protocol, 1.2.0 clients should work with
>> the new 1.2.1 servers.
>>
>> Bj�rn
>>
>> On 22.12.2009 16:57, Bj�rn Kolbeck wrote:
>>>
>>> On 22.12.2009 02:48, Dennis McEntire wrote:
>>>>
>>>> Thanks for writing back.
>>>>
>>>> So at this point we are completely dependent on the storage hardware
>>>> NOT failing?
>>>
>>> yes, in the sense that you'll have dead replicas in the replica list for
>>> the files until the scrubber is ready.
>>>
>>>>
>>>> We were hoping that there would be some recovery procedure when a
>>>> storage node fails and must be replaced. Also, what would happen if
>>>> the replacement storage node has the same name (uuid) and IP address?
>>>> Is that not a good idea?
>>>
>>> The uuid must be unique for each OSD. When you replace an OSD you
>>> basically remove the old one and add a new one. Both must have different
>>> uuids but you can safely re-use the IP-address for the new OSD.
>>>
>>>>
>>>> The scrubber tool you are talking about to replace lost replicas, is
>>>> that something that is in the works or just being talked about?
>>>
>>> The scrubber needs to be made replica-aware. I am working on that at the
>>> moment. I hope to have that finished after the Christmas break in early
>>> January.
>>>
>>>>
>>>> Thank you for the response,
>>>>
>>>> Dennis
>>>>
>>>>
>>>>
>>>> 2009/12/21 Bj�rn Kolbeck<kol...@zib.de>:
>>>>>
>>>>> On 18.12.2009 02:36, Dennis McEntire wrote:
>>>>>>
>>>>>> What is the procedure for removing an OSD from the system?
>>>>>
>>>>> currently, we don't have a tool that moves files away from an OSD if you
>>>>> want to remove one. We plan to include such a tool in the next release.
>>>>> We'll also enhance the scrubber to replace replicas.
>>>>>
>>>>>>
>>>>>> We had one storage server that has suffered a hardware failure and is
>>>>>> no longer available to the system. We need to replace it. What happens
>>>>>> to files that were on that box, and when we put in a new box does it
>>>>>> automatically honor the minimum replication count and start
>>>>>> re-replicating those files from the box that died?
>>>>>>
>>>>>
>>>>> As long as you have replicas on another OSD the client will
>>>>> automatically
>>>>> use the remaining OSDs when reading the file.
>>>>>
>>>>>
>>>>> Bj�rn
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
>