Brett,
So let me speak on behalf of the NOvA DAQ group, which I think is fair since was in on the design from day one and continue as the lead of the DAQ today (so I know both the requirements that we set forth as well as all the dirty details of the real DAQ and its operational performance).
We encountered a very similar problem when designing the DAQ and data handling system for the NOvA experiment. In particular we wanted a high reliability system which could perform all of the standard functions that you are describing, as well as perform different work flows for different classes of data (i.e. raw data, different types of log files, calibration files, and survey data). We also wanted a system that could run in a "lights out" operation and would require minimal to no effort to maintain over the life of the experiment. Basically we were designing to a situation similar to Minos/Soudan (which you are intimately acquainted with) but building on that experience to make it even more robust and tailored to the Ash River lab (which is more remote computing wise than Soudan).
We drafted a detailed set of requirements and interface specifications documents for our system (which are public from the NOvA DocDB system — let me know if you want them and I'll fish up the actual document numbers and links) and then worked with Fermilab to develop our full data handling system.
Now in terms of actual technologies we settled on:
For the file transport out of the DAQ we used the Fermilab File Transfer System (FTS) with add on modules we write specific to the NOvA DAQ setup and the data data format. The core FTS code is available from a number of official sources, and our modules are actually publically available from the NOvA CVS repositories. MicroBooNE and 35 Ton use the same code base but with modules specific to their detectors.
For the data/replica catalog we use the modern SAM system (the http based one). It handles all the replica information and metadata that we need. It also makes for a very smooth translation to the offline/analysis world since it keeps our entire file provenance for us and has the standard analysis project workflow code built in. The other advantage is that SAM understands the tape systems and other storage systems, so we didn’t have to build specific support into our DAQ for any of this.
For actual data transport we have gone through a whole host of different protocols as we were spinning NOvA up. We’ve done everything from simple scp based copies to full 3rd part gridftp transfers. In the end we hide all of the protocol details behind a simple API (this is formally “samcp”) and then have been able to configure or switch protocols as we found needs to.
As for data rates, yeah proto-dune is going to really push the envelope. At the same time the rates that you are talking about are not very far off of what NOvA and MicroBooNE have been doing. NOvA’s raw rate peaks at about 4 GB/s off of the detector and is then heavily filtered down to 2-8% of that rate (depending on trigger mix). MicroBooNE is more aggressive and is running in closer to a zero-bias readout and has data rates about 10-20x ours depending on their readout mode. We’re both using the same infrastructure and it isn’t limiting us and still has significant headroom and paths for scaling up. I’m not sure the exact numbers coming out of 35T but you could ping Tom Junk and he would know.
Now in terms of topologies and technologies to adopt, I saw Maxim’s slides from the DAQ workshop, and if those are similar to what you are asking about here, then I think this is a pretty straight forward. The actual layout is very close to the NOvA and MicroBooNe layouts. You really could just insert an FTS daemon on your DAQ raids and then configure an EOS endpoint and FNAL endpoint. From there you use the standard dataset replication tools to clone the data to BNL, NERSC and any other locations.
I think our (NOvA) system is pretty straightforward and looks to map well on to what you have shown. It’s also been adopted at this point in other parts of the neutrino community, so you shouldn’t have a hard time convincing those on Dune of its value, since there is good overlap with the collaboration with NOvA, MicroBooNE and the other LAr experiments.
Let me know if you have any questions. I actually went a bit over the top and wrote up much more detail on our systems initially, so if you want some leasons learned or best practices, I have that as well as all our technical documents.
Andrew
--