Problem with extra spaces appearing in the letters of a word

42 views
Skip to first unread message

Jeff Pickhardt

unread,
Mar 26, 2023, 10:28:51 PM3/26/23
to PDF::Reader
Hi James,

I noticed this gem has problems parsing some PDFs where the text is not necessarily clean.

For instance, this file: https://www.jstor.org/stable/3684663 (I can send it to you separately if you can't download it online)

Some parts of it get output like: "a b o u t a r e g r e s s i o n t o o r i g i n a l c h a o s"

However, it doesn't seem like it's inherently a problem with the file, because Python's PyPDF2 reads it correctly as "about a regression to original chaos"

Do you think there is some step that this reader is missing? Or alternatively is there some option I should set when using the PDF::Reader to get it to read the pdfs better?

Thanks,
Jeff

PS - I posted this on the Github issues but then read in the documentation that questions and feedback should be posted here.

Alba Hoxha

unread,
Apr 5, 2024, 9:15:35 PM4/5/24
to PDF::Reader

Jayson Presto

unread,
May 31, 2024, 4:33:58 AM5/31/24
to PDF::Reader
same issue for clustered word. I did some gsub. it works when the clustered word is in Pascal Case.

TheFirstWord = The First Word gsub(/([a-z])([A-Z])/, '\1 \2')
Thefirstword = Thefirstword ???

Do you have solution already?

Alba Hoxha

unread,
Jul 9, 2024, 4:43:39 AM7/9/24
to PDF::Reader
New proxy link 2024 Click now 
Link🔗👉🏻: https://s.id/new-Free-Unblocker
Link🔗👉🏻: https://s.id/NewWorkingProxy2024
Link🔗👉🏻: https://s.id/Best-unbloger-proxy
Link🔗👉🏻: https://s.id/100-Freeunblockers2024
Link🔗👉🏻: https://s.id/2024-best-unbloger
Reply all
Reply to author
Forward
0 new messages