I was planning on having everything under FhGFS: users sometimes launch
MPI and other parallel jobs that do I/O with files under their homes.
> >
> >
> > 2. Each node will need to be storage server, metadata server, and
> > client
> > at the same time. But I am not sure how best to use the disks.
> >
> > From the documentation, it seems the best would be to use an xfs
> > partition for data storage. However, for metadata ...
> >
> >
> > 2.1 Can I place the directory for the metadata in the same ext4
> > partition where the rest of the operating system will be installed?
> > (I
> > would format this ext4 partition with the "mkfs.ext4 -i 4096 -I 512
> > -J
> > size=400 -Odir_index,filetype" and use extended attributes, as
> > explained in the Server Tuning docs).
> Yes, it is possible. Please remember you need for the meta date 1% of
> the hole disk space which is planned for the fhgfs.
Yes, thanks.
> If you didn't need a high metadata performance it is also possible to
> store the metadata on the XFS. Do your software make a lot of file
> stats, file creates, ...?
Generally not. However, reserving about 2% for metadata means reserving
about 40 GB on the ext4 partition, which is not too much, so it seems I
might be better off placing the metadata in the ext4 / partition.
> >
> >
> > 2.2 I can configure the four disks per node (and per RAID
> > controller)
> > as a single virtual disk (e.g., with RAID 0 or RAID 10), or I can
> > configure to have two virtual disks, each as RAID 0, the first with
> > 1
> > physical HD (OS and metadata server), the second with the remaining
> > 3
> > HDs (storage server). In either case, it is the same card which is
> > controlling the four disks for a node. What would be better? I
> > assume
> > that using a single virtual disk with RAID 0 is probably best (the
> > RAID
> > card will do its job better?).
> This depends on the RAID-controller. Some Raid-controller will not
> deliver the optimal performance if several virtual disks with different
> RAID levels are configured.
> RAID 0 with 4 disks is the best choice in your case. In a configuration
> with two virtual disk you will waste disk space.
OK, excellent. That seems like the simpler and more flexible
configuration. I assume I could just as well use RAID 10 with the 4 disks
if space is not limited?
> >
> >
> > 3. If one of the nodes fails, and if I keep a backup of the data, can
> > I
> > recover by just copying the shared disk backup, and mounting as a
> > regular, local, file system? (To allow this, I think I need to
> > create
> > the backup from one of the clients).
> Yes this will work. You need to start the fhgfs-client and make a backup
> from the mounted FhGFS.
OK, understood.
> >
> >
> > 4. Network configuration. Would it make sense to try to use at the
> > same
> > time the ethernet and infiniband connections, one for metadata and
> > the
> > other for storage transfers? I do not see how to do this.
> I recommend to use the infiniband connection for both. But you could
> configure it in the configuration files. Use the configuration option
> "connInterfacesFile". More details are available in the wiki.
>
http://www.fhgfs.com/wiki/wikka.php?wakka=FAQ#multiple_nics
OK, thanks. Infiniband for both seems the simplest thing to do.
Thanks for your detailed advice.
Best,
R.
> kind regards,
> Frank
> >
> >
> > 5. What other options might I consider? I am also thinking about
> > GlusterFS
> > and PVFS2 (and asking similar questions on their list). (Lustre
> > definitely seems to discourage client and OSS in same node).
> >
> >
> >
> > Any other comments or suggestions for this setup are welcome.
> >
> >
> > Best,
> >
> > Ramon
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone:
+34-91-497-2412
Email:
rdi...@gmail.com
ramon...@iib.uam.es
http://ligarto.org/rdiaz