Text extracted seems to be encoded

Elliot Tison

unread,

Nov 9, 2020, 5:12:35 PM11/9/20

to PDF::Reader

Hello,

I ofter use this gem with no issue. However, for one client, I cannot extract text from several PDF files.

The text extracted is like this:

\n \u0004 \n \u0005 \t \u0004 \b \a \u0003 \u0001 \u0006 \u0006 \u0005 \u0004 \u0003 \u0002 \u0001\n\n\n.......

Is this a known limitation like mentioned in your documentation ("due to the way it has been stored, or the use of invalid bytes")?

Many thanks.

Best

elliot

James Healy

unread,

Nov 9, 2020, 5:21:13 PM11/9/20

to pdf-r...@googlegroups.com

Hi Elliot,

It's hard to know without seeing a sample file. There's cases where
pdf-reader can be improved to handle rare approaches to encoding, and
other cases where there's no way to extract the text.

If you're able to share a file directly to my address, I'm happy to
take a quick look.

James

> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/4d9920c5-8af1-40d4-9967-d331258877a2n%40googlegroups.com.

Alba Hoxha

unread,

Jul 9, 2024, 5:09:06 AM7/9/24

to PDF::Reader

New proxy link 2024 Click now

Link🔗👉🏻: https://s.id/new-Free-Unblocker
Link🔗👉🏻: https://s.id/NewWorkingProxy2024
Link🔗👉🏻: https://s.id/Best-unbloger-proxy
Link🔗👉🏻: https://s.id/100-Freeunblockers2024
Link🔗👉🏻: https://s.id/2024-best-unbloger

Reply all

Reply to author

Forward