Problem with extra spaces appearing in the letters of a word

Jeff Pickhardt

unread,

Mar 26, 2023, 10:28:51 PM3/26/23

to PDF::Reader

Hi James,

I noticed this gem has problems parsing some PDFs where the text is not necessarily clean.

For instance, this file: https://www.jstor.org/stable/3684663 (I can send it to you separately if you can't download it online)

Some parts of it get output like: "a b o u t a r e g r e s s i o n t o o r i g i n a l c h a o s"

However, it doesn't seem like it's inherently a problem with the file, because Python's PyPDF2 reads it correctly as "about a regression to original chaos"

Do you think there is some step that this reader is missing? Or alternatively is there some option I should set when using the PDF::Reader to get it to read the pdfs better?

Thanks,

Jeff

PS - I posted this on the Github issues but then read in the documentation that questions and feedback should be posted here.

Alba Hoxha

unread,

Apr 5, 2024, 9:15:35 PM4/5/24

to PDF::Reader

Jayson Presto

unread,

May 31, 2024, 4:33:58 AM5/31/24

to PDF::Reader

same issue for clustered word. I did some gsub. it works when the clustered word is in Pascal Case.

TheFirstWord = The First Word gsub(/([a-z])([A-Z])/, '\1 \2')
Thefirstword = Thefirstword ???

Do you have solution already?

Alba Hoxha

unread,

Jul 9, 2024, 4:43:39 AM7/9/24

to PDF::Reader

New proxy link 2024 Click now

Link🔗👉🏻: https://s.id/new-Free-Unblocker
Link🔗👉🏻: https://s.id/NewWorkingProxy2024
Link🔗👉🏻: https://s.id/Best-unbloger-proxy
Link🔗👉🏻: https://s.id/100-Freeunblockers2024
Link🔗👉🏻: https://s.id/2024-best-unbloger

Reply all

Reply to author

Forward