This past week Stefano Iacus gave a recorded talk entitled "Bringing Data Close to Compute at Harvard Dataverse" which I just put on DataverseTV:
https://dataverse.org/dataversetv
Below is the abstract and direct links:
"With data becoming increasingly large and machine learning and AI
equally popular also outside the hard sciences, we decided to exploit
the versatility and cost effectiveness of the MOC infrastructure in
terms of high performance computing (NERC) and storage (NESE). Moreover,
being MOC an integrated infrastructure, it is possible to run computing
on large data without moving data over the Internet.. In this talk we
present a proof of concept of this approach where the Dataverse software
is integrated with NESE and NERC realizing the goal of computing
directly on large data."
Thanks,
Phil