Hi list:
I have a group of records I'm working with that I know contain some sensitive files (e.g., job candidate evaluations). I want to identify these and segregate them from the rest of the accession. I am using Bulk Extractor (within BitCurator), and tried to create an alert list text file to flag sensitive words and phrases (see pages 27-28:
http://digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf).
However, passing this alert list against a directory of files does not seem to result in any output, even though I know these words appear in at least some of the text documents. The built-in scanners are working fine and output things like phone numbers. Does anyone have advice? Alternative suggestions re: tools/methods for identifying documents with sensitive keywords would also be welcome.
Thanks,
Eira Tansey
Digital Archivist/Records Manager
Archives and Rare Books Library
University of Cincinnati Libraries
806 Blegen Library
2602 McMicken Circle
PO Box 210113
Cincinnati, OH 45221-0113
Direct Tel: 513-556-1958
Library Tel: 513-556-1959
Email: eira....@uc.edu
Web: www.libraries.uc.edu/libraries/arb/