How to search PDF and add comments

Justin Hernandez

unread,

Jul 20, 2013, 11:46:46 AM7/20/13

to pdfhummus-in...@googlegroups.com

Hello and thank you for making Hummus! I have a quick question. Not looking for an exhaustive example but if you could please point me in the right direction I would appreciate :).

Is it possible to search a pdf for a specific string, get the string offset and add a comment to the page by the found string dynamically? Thanks again. :)

Gal Kahana

unread,

Jul 21, 2013, 12:03:57 PM7/21/13

to pdfhummus-in...@googlegroups.com

yes and no.

the yes part is that you can read anything in the pdf file, and adding a comment is also possible. i got a sample for adding comments here - https://github.com/galkahana/HummusJS/blob/master/tests/ModifyingExistingFileContent.js
for general modification info, see here https://github.com/galkahana/HummusJS/wiki/Modification

for general parsing inf, see here https://github.com/galkahana/HummusJS/wiki/Parsing

however, when it comes to reading the text of pdf, as in interpreting what written text the document has, hummus does not provide a high level interface. to implement such a thing you'll have to use the lower level methods of hummus, getting the low level pdf objects (such as string, dictionary etc) and interpret the text in accordance. this requires some knowledge of PDF, and not something 100% trivial. if you are familiar and feel up for the task - otherwise, then at least for the part of figuring out the position i would suggest looking for another solution that provide such text extraction/search services.

Regards,

Gal.

Justin Hernandez

unread,

Jul 21, 2013, 3:34:09 PM7/21/13

to pdfhummus-in...@googlegroups.com

Gal,

Thanks for the feedback. I appreciate it! Yes I need to deepen my understanding of PDF files haha

Gal Kahana

unread,

Jul 22, 2013, 1:30:07 AM7/22/13

to pdfhummus-in...@googlegroups.com

Cool. Justin, if you do decide to go for it, i might be able to assist with some pointers. haven't done such algorithm before, but i am aware of how text is written in a PDF.