Hi Bit Curators,
I was wondering if anyone has any experience with, or seen the use of, tools such as (but not limited to) bulk_extractor to identify PII in web archive (WARC) data?
We're checking over in the IIPC Slack from the web archives angle, but thought it might make sense to check here as well. Any pointers to anything remotely related to this topic would be greatly appreciated!
Ed Summers