Improvement ideas

4 views
Skip to first unread message

amaury

unread,
Jul 30, 2009, 9:23:46 AM7/30/09
to finemedia-oss
I open this thread to discuss about your suggestions for new
functionalities in FineFS.
Feel free to expose your ideas.

I have three improvements in mind, what do you think about them?

1) Multi-cluster node
It could be useful to set up a node that is a part of two or more
clusters. It will act as a "bridge" between these clusters, forwarding
data across all of them.

Pros: If you have two groups of machines, in two different locations
(e.g. two datacenters), it's not very efficient to set one big
cluster. It should be better to create two clusters, and join them
using "gateway nodes".

Cons: To avoid the risk of data loss, it is necessary that more than
one node is connected to the different clusters. It may be more
difficult to manage the clusters' node loops.


2) Client node with data cache
At this time, there is two type of nodes in FineFS. Active nodes are
storing data locally and replicate over the network; passive nodes are
just a client code library which connects to the active nodes.
It could be interresting to have some nodes that store a part of the
cluster data on their local disks.

Pros: A client node with local data will be more responsive than a
totally passive one which must to get data over the network all the
time.

Cons: To manage a data cache, the client code will have to maintain a
list of files access, to remove the less used file when a new one is
needed and the dedicated disk space is full. This process is more
complex than the current one.


3) Active node with data cache
Similarly to client nodes, active nodes may take advantage to work
with a data cache. Currently, all active nodes of a cluster are
replicating their data on every other nodes, so they potentially need
some very big hard disks. It could be better to set up the size of
usable disk space, and let the node manage this space in collaboration
with other nodes.

Pros: This functionnality will allow to create clusters with
heterogenous machines. Even a machine with small-sized disks would
then become an active node (an active node should be more responsive
than a client node with cache). Ideally, most of the active nodes of a
cluster would be small servers (with cache), and some "full" active
nodes would be there for a complete snaposhot of the cluster's data.

Cons: It will be even more complex than for a client cache management,
and the whole cluster's performance may decline.

amaury

unread,
Aug 5, 2009, 4:05:11 AM8/5/09
to finemedia-oss
Another idea: At this time, a FineFS node always tries to send data to
its next node. If there is no answer, it tries the next node, and so
on. But sometimes, a node may be unavailable for a "long" time (say,
from some minutes to some days). Then, every information will be sent
to this node all the same, and every communication will fail. It's a
big loss of time.

We don't want to remove this node from the cluster, because it may
come back online soon. But it should be great to set up that this node
is momentarily offline. FineFS will directly add the "fail" messages
to the error log, without trying to connect to this node (but will
still connect to the next one).

amaury

unread,
Aug 18, 2009, 5:12:51 AM8/18/09
to finemedia-oss
This one is implemented.
Reply all
Reply to author
Forward
0 new messages