Can bulk extractor be set to ignore specific files/types?

62 views
Skip to first unread message

Brian Dietz

unread,
May 9, 2016, 3:23:52 PM5/9/16
to BitCurator Users

When running bulk extractor on a directory of files, can it be set to ignore specific files or file types? With my immediate use case (video oral histories produced by my institution) I could probably just not run BE at all. But, the difficulty of running the process when certain file types are involved makes me wonder if it can't be set to skip individual files (something like rsync's --exclude option) or various file types via a referenced list (the way you can provide a Stop List or Find Regex file).

Brian

Porter Olsen

unread,
May 10, 2016, 1:08:20 PM5/10/16
to bitcurat...@googlegroups.com
Hi Brian,
I think that's counter to the way bulk extractor works. It is primarily a tool for peering past file formats and file system data and into the raw bits, so--to my knowledge--there isn't a setting for exempting particular file types. You could write a regular expression to find particular file types, but it sounds like that's the opposite of what you want to accomplish.

Porter

On Mon, May 9, 2016 at 3:23 PM, Brian Dietz <bjd...@ncsu.edu> wrote:

When running bulk extractor on a directory of files, can it be set to ignore specific files or file types? With my immediate use case (video oral histories produced by my institution) I could probably just not run BE at all. But, the difficulty of running the process when certain file types are involved makes me wonder if it can't be set to skip individual files (something like rsync's --exclude option) or various file types via a referenced list (the way you can provide a Stop List or Find Regex file).

Brian

--
You received this message because you are subscribed to the Google Groups "BitCurator Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcurator-use...@googlegroups.com.
To post to this group, send email to bitcurat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcurator-users/14304de5-b4c6-45f5-aa53-4c1c269a1c03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Disregardmatthew Farrell

unread,
May 10, 2016, 4:03:05 PM5/10/16
to bitcurat...@googlegroups.com
When running bulk_extractor in the command line, the switch for a directory of files is -R, not really allowing you to get down to the level of running it on any single file unless the file is a disk image.

You could possibly write something to process an annotated-feature text file to remove features found in files with particular extensions. I haven't tried this, but running sed on an annotated feature file theoretically would remove matching lines.

sed -i '/[your regular expression here]/d' [annotated feature file].txt

There are likely more elegant ways to process those files.

-f


Reply all
Reply to author
Forward
0 new messages