I've been analyzing the available code and I think there are a couple of
points that must be discussed before moving on ;). I'll try to explain what
I've been able to undestand from the current code, before adding my
questions...please correct me where I'm wrong.
At this point we have in osd_proxies a list of all the OSD's the client has
ever contacted. Each of the structs in this list keeps, among other things,
the coordinates of the OSD and the last measured RTT. This data is requested
and updated periodically for all the OSDs in our "cache", according to the
value returned by get_osd_ping_interval_s, right?
When the select_file_replica function is called, our "vivaldi module" (i.e.
vivaldi_flog) uses the coordinates stored in that list and the client
coordinates (managed by itself) to select the most appropiate replica and
takes the opportunity to recalculate the own client's coordinates (?)
And these are my doubts/questions:
1. Would it be possible to add a new function that forces the recalculation of
the client's coordinates every time a pingResponse is received? Something
like:
typedef int ( *recalculate_vivaldi )( const char* osd_uuid, double x, double
y, double local_error, double rtt_s);
This way, the client's coordinates will stay updated even if the function
select_file_replica is not ever called again.
2. According to Vivaldi algorithm, It's not necessary to request pings
periodically to all the known OSD's, but only to one each time. I guess this
would be the most difficult part to change, according to the current model,
but we can decrease drastically the produced traffic.Otherwise we also can use
bigger periods, but with large lists of osds this solution won't be scalable.
3. Is it possible to detect when the the requests' time-out expire? In our
last tests, unstable servers showed an inconsistent behaviour (RTT peaks and
package losses, mainly) that must be taken into account by Vivaldi. At this
point we could use function recalculate_vivaldi to tell our module when a
response is never received.
4. The way we fill our osd_proxies list could be a problem for the algorithm.
According to the documentation, using always the same few OSDs to recalculate
our position could come out into coordinate spaces that don't represent the
real distances among nodes. On the contrary, it would be advisable to get
certain number of UUIDs directly from the DS (randomly chosen) and use them to
recalculate the client's coordinates.
I think Vivaldi doesn't need much more to work in client side :) Let me know
if there's something that must be clarified or discussed.
Kind regards,
Juan.
--
-------------------------------------
Juan González de Benito
Storage Systems - Computer Sciences
Barcelona Supercomputing Center (BSC)
juan.gdb AT bsc DOT es
*************************************
We plan to have a new release with experimental DIR replication soon. Is
vivaldi ready to be included in the release or can you estimate when it
will be ready?
Best Regards
Björn
Juan González de Benito schrieb:
these last days we've been testing vivaldi in planetlab, trying to
adjust the algorithm parameters, and probably we still will change some
little details before the definitive version, but the current one is
ready for being included :)
Server side is done and regarding to the client, it would be necessary
just to remove the code we're using to evaluate our results. At this
point, nodes are able of determining its own position and they all
create a consistent coordinate space.
The policy to choose the best replica from the available ones it's
almost finished and it just depends on issue#77. Regarding to replica
creation, I haven't started yet with that, but with the new policies
architecture, I guess it shouldn't take more than a couple of days.
After all this, we still plan to work on automatic duplication (which is
something we already developed for the old servers version), but I think
that would fit better in a later release.
By the way, when do we plan to have this new release?
Kind regards,
Juan.
Juan Gonzalez schrieb:
>
> Hi Björn,
>
> these last days we've been testing vivaldi in planetlab, trying to
> adjust the algorithm parameters, and probably we still will change some
> little details before the definitive version, but the current one is
> ready for being included :)
sounds good :)
>
> Server side is done and regarding to the client, it would be necessary
> just to remove the code we're using to evaluate our results. At this
> point, nodes are able of determining its own position and they all
> create a consistent coordinate space.
>
> The policy to choose the best replica from the available ones it's
> almost finished and it just depends on issue#77. Regarding to replica
> creation, I haven't started yet with that, but with the new policies
> architecture, I guess it shouldn't take more than a couple of days.
>
> After all this, we still plan to work on automatic duplication (which is
> something we already developed for the old servers version), but I think
> that would fit better in a later release.
>
> By the way, when do we plan to have this new release?
I think within the next two or three weeks.