HTRC Software Release
The HathiTrust Research Center reached a development benchmark in its release of production infrastructure to support data mining and textual analysis of volumes in HathiTrust.
The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight <http://projectblacklight.org/>), and access to SEASR analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure.
The production release concludes the first six month period in Phase 2 of development of the HTRC (Oct 2012-March 2014). Phase 2 will also include the development of the HTRC-Sloan-Cloud – infrastructure that will include additional mechanisms to allow secure, non-consumptive access to the entire HathiTrust corpus – and systems to accommodate the full 10.6 million HathiTrust volumes in the HTRC. For more information on HTRC services and testing of the production infrastructure, please join our HTRC-usergroup-l listserv at
https://list.indiana.edu/sympa/subscribe/htrc-usergroup-l.