OCR content in AtoM's search

61 views
Skip to first unread message

Cássio Felipe de Oliveira Pires

unread,
Mar 25, 2023, 2:43:32 PM3/25/23
to AtoM Users
Hi, everyone

Is there a way to block AtoM from displaying search results based in OCR text? This can be done globally (like removing pdftotext) or is it possible to block the OCR indexation of specific digital objects?

Thanks!

Cássio Felipe de Oliveira Pires

unread,
Mar 25, 2023, 3:03:27 PM3/25/23
to AtoM Users
alternatively, is it possible to remove from AtoM the 64K of OCR text from a digital object that has been indexed into the database?

Dan Gillean

unread,
Mar 27, 2023, 11:57:04 AM3/27/23
to ica-ato...@googlegroups.com
Hi there, 

Going forward: yes, if you remove pdftotext, then this block of code that indexes the OCR layer should not activate, meaning your future PDF uploads will not be indexed. 

For those already indexed in your system: 

We could try to use SQL to delete the OCR text from the database. As always, please proceed at your own risk and make a backup first!

If you want to delete ALL existing OCR transcripts from your AtoM database, try the following query: 
  • DELETE FROM property WHERE name='transcript' AND scope='Text extracted from source PDF file\'s text layer using pdftotext';
Hope that helps! 


Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/c3a09b8b-d18c-4a16-b837-3e70bd1f29c3n%40googlegroups.com.

Cássio Felipe de Oliveira Pires

unread,
Mar 29, 2023, 8:04:27 AM3/29/23
to ica-ato...@googlegroups.com
Yes, it helps. Thanks Dan!

You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/8tIc1sJLr-A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZJ%2B7oTb%2BZpJONqOuvLiS5EOeqwoAqPzabFufJhRHztSnQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages