PDF search limit change

53 views
Skip to first unread message

Gábor Kovács

unread,
Oct 22, 2020, 4:24:20 AM10/22/20
to AtoM Users
Hello, is it possible to modify this value to scan the entire document? I mean specifically "truncates PDF text after the first 65,535 bytes". Possible to change this ? 

Dan Gillean

unread,
Oct 22, 2020, 4:49:19 PM10/22/20
to ICA-AtoM Users
Hi Gábor,

You will need to make a local customization to the value column in the property_i18n table in AtoM's MySQL database. The field is currently set as TEXT, and you will want to change its type to MEDIUMTEXT or LONGTEXT. For further context, see: 
For reference, here are the 4 TEXT types available in MySQL: 
  • TINYTEXT: 255 characters - 255 B
  • TEXT: 65,535 characters - 64 KB
  • MEDIUMTEXT: 16,777,215 - 16 MB
  • LONGTEXT: 4,294,967,295 characters - 4 GB
See: 
I strongly recommend you backup your data before trying to alter any table settings! You'll need to restart MySQL and PHP-FPM after making any changes, and you would also want to re-extract the text layer after, using this command-line task: 
Finally, you would also want to rebuild the search index, so that newly extracted text shows up in search results. 

Good luck! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


On Thu, Oct 22, 2020 at 4:24 AM Gábor Kovács <kovacsg...@gmail.com> wrote:
Hello, is it possible to modify this value to scan the entire document? I mean specifically "truncates PDF text after the first 65,535 bytes". Possible to change this ? 

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/81a5bb9c-765a-471b-8a9f-e68d97aeb301n%40googlegroups.com.

David Juhasz

unread,
Oct 22, 2020, 6:49:48 PM10/22/20
to ica-ato...@googlegroups.com
Hi Gábor,

There is also a line in the code that needs to be changed to allow more than 65,535 bytes of PDF text to be stored and indexed:

Best regards,
David
--

David Juhasz
Senior Developer
Artefactual Systems
he/him


Reply all
Reply to author
Forward
0 new messages