amaury
unread,Jul 30, 2009, 9:23:46 AM7/30/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to finemedia-oss
I open this thread to discuss about your suggestions for new
functionalities in FineFS.
Feel free to expose your ideas.
I have three improvements in mind, what do you think about them?
1) Multi-cluster node
It could be useful to set up a node that is a part of two or more
clusters. It will act as a "bridge" between these clusters, forwarding
data across all of them.
Pros: If you have two groups of machines, in two different locations
(e.g. two datacenters), it's not very efficient to set one big
cluster. It should be better to create two clusters, and join them
using "gateway nodes".
Cons: To avoid the risk of data loss, it is necessary that more than
one node is connected to the different clusters. It may be more
difficult to manage the clusters' node loops.
2) Client node with data cache
At this time, there is two type of nodes in FineFS. Active nodes are
storing data locally and replicate over the network; passive nodes are
just a client code library which connects to the active nodes.
It could be interresting to have some nodes that store a part of the
cluster data on their local disks.
Pros: A client node with local data will be more responsive than a
totally passive one which must to get data over the network all the
time.
Cons: To manage a data cache, the client code will have to maintain a
list of files access, to remove the less used file when a new one is
needed and the dedicated disk space is full. This process is more
complex than the current one.
3) Active node with data cache
Similarly to client nodes, active nodes may take advantage to work
with a data cache. Currently, all active nodes of a cluster are
replicating their data on every other nodes, so they potentially need
some very big hard disks. It could be better to set up the size of
usable disk space, and let the node manage this space in collaboration
with other nodes.
Pros: This functionnality will allow to create clusters with
heterogenous machines. Even a machine with small-sized disks would
then become an active node (an active node should be more responsive
than a client node with cache). Ideally, most of the active nodes of a
cluster would be small servers (with cache), and some "full" active
nodes would be there for a complete snaposhot of the cluster's data.
Cons: It will be even more complex than for a client cache management,
and the whole cluster's performance may decline.