AVClam- virus scan Software

63 views
Skip to first unread message

Eirva Ch. Diamessis

unread,
Jun 14, 2024, 3:23:44 PMJun 14
to BitCurator Users
Hello,
We are reviewing an exciting process for stabilizing electronic media, and we would like to scan for viruses in these media before capturing them (in Linux, we use Guymager to capture them). We are thinking of using ClamAV. The ClamTk is no longer maintained. So, we need to come up with some script in the command line. I have briefly been through the AVClam ( Introduction - ClamAV Documentation ) documentation. But I'd like to know if people here use AV Clam and if they have recommendations about what to include or what script they use. Or if you use a different virus scan software. 

Eirva Ch. Diamessis

unread,
Jun 14, 2024, 3:25:05 PMJun 14
to BitCurator Users
Hello,
We are reviewing an exciting process for stabilizing electronic media, and we would like to scan for viruses in these media before capturing them (in Linux, we use Guymager to capture them). We are thinking of using ClamAV. The ClamTk is no longer maintained. So, we need to come up with some script in the command line. I have briefly been through the AVClam ( Introduction - ClamAV Documentation ) documentation. But I'd like to know if people here use AV Clam and if they have recommendations about what to include or what script they use. Or if you use a different virus scan software. 
Thank you for any suggestions and for considering this question.
Eirva

Donald Mennerich

unread,
Jun 18, 2024, 7:19:24 PMJun 18
to bitcurat...@googlegroups.com
Hello

Yes, at NYU we use Clam AV on all our electronic records packages going into the repository. In short, when a collection is being transferred I run something like `clamscan -r /path/to/staging > collection.log`, then if anything looks off in the output report I'll remediate from there. We use Archivematica, but it disabled so that there aren't several network hops before clam av is run and that we don't waste any processing cpu on archivematica prior to the job failing. I've been meaning to get this as part of our automated system but for now it's run manually. 

Don 

Donald R. Mennerich, Senior Digital Archivist
New York University Libraries


--
You received this message because you are subscribed to the Google Groups "BitCurator Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcurator-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcurator-users/8fa9d297-c9b7-4a97-a2c5-e2c378f8cb9en%40googlegroups.com.

Eirva Ch. D.

unread,
Jun 25, 2024, 10:23:34 AMJun 25
to bitcurat...@googlegroups.com
Hello Don,
Thank you very much for your helpful reply!
Best,
Eirva




You received this message because you are subscribed to a topic in the Google Groups "BitCurator Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bitcurator-users/wTuEMbeAcxU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bitcurator-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcurator-users/CAD067E%3Djr_Lg9Oj4DEXPtivWdEN0rXiNqkMmmR0RvXmi1cYXiQ%40mail.gmail.com.

Donald Mennerich

unread,
Jun 25, 2024, 10:39:09 AMJun 25
to bitcurat...@googlegroups.com
I should have added that I use this pre-transfer clamav step to find/document any zero-length files and symlinks in a transfer. I use grep to find these as well as report out on virus hits. 

Don

Donald R. Mennerich, Senior Digital Archivist
New York University Libraries

James Truitt

unread,
Jun 26, 2024, 9:16:08 AMJun 26
to BitCurator Users
Hi Eirva,

We use more or less the same command as Don. I will note, though, that by default, ClamScan ignores files larger than ~25MB. To circumvent this, you can add the following arguments at the end of the above command:

--max-filesize=Xm --max-scansize=Ym

where X is the largest filesize (in megabytes) you want to scan and Y is the largest number of megabytes you want to extract from a single compressed file.

Best,
James

Eirva Ch. D.

unread,
Jun 30, 2024, 9:38:01 PMJun 30
to bitcurat...@googlegroups.com
Hello James and Don,
Thank you very much for your helpful replies and I am sorry it took me a while to reply. I have one more question: have you used at all AVClam for checking PII? Like these two:  —structured-cc-count = checking for credit cards and —structured-ssn-format . Do you have any thoughts on this? 
Thank you again for all your help! 
Eirva 

Donald Mennerich

unread,
Jul 2, 2024, 9:00:24 AMJul 2
to bitcurat...@googlegroups.com
Eirva, 

No, I've never tried using ClamAV for this, Bitcurator comes with bulk extractor installed which is a very flexible tool for extracting all sorts of data from images: https://github.com/simsong/bulk_extractor/wiki

Don

Donald R. Mennerich, Senior Digital Archivist
New York University Libraries

James Truitt

unread,
Jul 2, 2024, 9:24:50 AMJul 2
to bitcurat...@googlegroups.com
Hi Eirva,

I'll echo Don again—we usually use BulkExtractor, and tend to run it via Brunnhilde. I actually wasn't aware of these PII parameters for ClamAV, though, and I admit that I'm tempted to give them a try…

Best,
James





--
James Truitt (he/him)
Digital Archivist
Friends Historical Library of Swarthmore College

Shelly Black

unread,
Jul 2, 2024, 11:09:14 AMJul 2
to BitCurator Users
Hello! I didn't know about the clamav SSN and CCN flags either. I did a quick test on some mock data, in which bulk_extractor detects SSNs and CCNs, and clamav didn't seem to find any.  I'm curious what the output looks like when clamav finds those numbers.

Shelly

Eirva Ch. D.

unread,
Jul 5, 2024, 2:01:28 AMJul 5
to bitcurat...@googlegroups.com
Hello Don, 
Thank you for your reply and for sharing the link for the bulk extractor. It is good to know what other institutions use. Very helpful tool. 
Thank you, 
Eirva


Senior Manuscript Processor 
Cornell University Library 


Eirva Ch. D.

unread,
Jul 5, 2024, 2:09:15 AMJul 5
to bitcurat...@googlegroups.com
Hello James, 
Nice, thank you for your reply. We had custom Python scripts to check for ssn, but we can consider using BulkExtractor instead. 
While reading the AV Clam  manual I came across this option and I was curious to give a try too and see how it works. 
Thank you, 
Eirva 

Eirva Ch. D.

unread,
Jul 5, 2024, 2:20:42 AMJul 5
to bitcurat...@googlegroups.com
Hello Shelly!
Thank you for your reply! 
It sounds like you all use bulkextractor. Good to know. 
I am also curious to see results when ClamAV finds ssn. In case I have any results like that, I will share here. 
Thank you again for your help. 
Eirva 

Reply all
Reply to author
Forward
0 new messages