Understanding Chunk File Quota

110 views
Skip to first unread message

Willy Markuske

unread,
Apr 6, 2022, 8:58:11 AM4/6/22
to beegfs-user

Hello All,

I have users working with a mix of large and small files for bioinformatics work stored on a BeeGFS system. I've found that the number of small files is having a larger influence on performance and data movements (especially backups).

Is there a correlation between the number of chunk files and actual files a user has stored? I was looking to limit chunk files and work with users that tend to save millions of small files for output rather than combining results into larger storage files. Or if chunk files are distributed across the cluster is it possible that a user can have many chunk files caused by large files split across targets. My system is setup with a chunksize of 512K and RAID0 strip pattern with a desired 4 storage targets.

For example, I have a user with a storage size of only 2.1TiB but 616944811 chunk files.

Regards,

--

Willy Markuske

HPC Systems Engineer

Research Data Services

P: (619) 519-4435

Robert Anderson

unread,
Apr 6, 2022, 9:46:57 AM4/6/22
to Willy Markuske
When our BioInformatics users ran assembly jobs on BeeGFS their performace was nearly an order of magnitude slower than Lustre storage.  After a lot of head scratching we determined many of those software packages assume files will be in the linux file system cache, and their process is to just re-read the files many times.  BeeGFS doesn't use the fast linux file system cache so it VERY slow for this type of work.

I did not notice quote issues, but we use a ZFS backend storage. 

I did find that my metadata backups for these large number of files storage just took too long to backup.  I've been sing borg-backup to nightly archive the metadata on non-bioinformatics systems.



Robert E. Anderson

University of NH / Research Computing Center / Data Center Operations



Willy Markuske wrote:


--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/846db191-21f3-de7c-2051-cd2e9bf6efb9%40sdsc.edu.
Reply all
Reply to author
Forward
0 new messages