Hi All,
In order to size our ZFS-based BeeGFS systems going forward, I recorded some behaviors from our single server backup system. Hopefully this will inform the decisions of others and lead to informative discussion.
It's a dual socket Intel machine with 2x12 physical cores of Intel Xeon E5-2670 v3 @ 2.30GHz, with 256GB RAM, connected to 2x60slot HGST JBODs (partially populated - 45 8TB disks in each) via 2x LSI SAS3008 HBAs.
There are 3 zpools currently, with a total usable capacity of about 543TB. The zpools do compression but not encryption, recovering more than the parity loss. For all details on the zpools, see: http://termbin.com/qmfh
The zpools were aggregated and presented to the sending service node as a BeeGFS mount.
This backup server is separated from our main cluster via a single 10Gb ethernet line.
We run a backup each night from midnight to 6a every night. The following data was from a run on a Friday, so captured data from a fairly typical run when we had a normal user load. he backup traverses several large filesystems and I think some of the peaks seen are when it hits a new filesystem and picks up a lot of new data altho the timing is fairly non-deterministic.
The data was sampled every 15s over the course of the run for a total of 1600 points, initiated about 10m before the backups started for a baseline. These backups were a series of multiple parallel rsyncs from our main cluster via a service node that was dedicated to the backup.
We recorded loadavg, network bandwidth, zpool IO, actual cpu usage, RAM usage (after zeroing pagecache, dentries and inodes with:
echo 3 > /proc/sys/vm/drop_caches
)
The 1st graph shows network IO. Since this is a series of rsyncs, we never see enormous bandwidth - a lot of the IO is rsync chatter and as the run progresses, you can see that it segregates into 4 streams - fairly high bandwidth at about 100MB/s when pushing large files, 2 streams of low bandwidth at about 5 and 15 MB/s, when doing very small files, and the rest scattered between.

The next (and related) is zpool IO - it essentially reflects the network traffic and is well below the max we can push into the BeeGFS mount. However, as noted above, it's a lot of tiny rsyncs and chatter so the overall IO is bound to be small. It does balance fairly well tho.

The next is the loadavg which peaks early and then tends to level out as it traverses the filesystems, but peaking when new data dirs are hit. Note that most of the values are under 5 so the system is keeping up with the data flow fairly well.

The actual CPU load, separated from loadavg by summing CPU values from 'top'. Note that these values are percentage so the '1000' value indicates a CPU load of 10 cores. Since this machine has 24 cores, it is over-provisioned for the job it's doing, although when pushing new, very large files to the disks, we can drive the CPU close to 20cores worth, but not for long. This is interesting since the CPUs are doing the XORs, compression, and checksumming on all the data coming in. We expected much higher usage - the ZFS coders did a good job.

Finally the RAM usage profile is fairly unsurprising. Just prior to starting to log the profile, I cleared the cache as described above so I'd be recording a fairly clean system. You can see the system aggressively caching the data coming in and keeping all teh RAM in use over the course of the run.

If there's interest, I'll package the (pretty trivial) collection script and plotting routines for others to use.
Hope this is of some help.
hjm