Text extracted seems to be encoded

71 views
Skip to first unread message

Elliot Tison

unread,
Nov 9, 2020, 5:12:35 PM11/9/20
to PDF::Reader
Hello,

I ofter use this gem with no issue. However, for one client, I cannot extract text from several PDF files.

The text extracted is like this:
\n \u0004 \n \u0005 \t \u0004 \b \a \u0003 \u0001 \u0006 \u0006 \u0005 \u0004 \u0003 \u0002 \u0001\n\n\n.......

Is this a known limitation like mentioned in your documentation ("due to the way it has been stored, or the use of invalid bytes")?

Many thanks.

Best

elliot

James Healy

unread,
Nov 9, 2020, 5:21:13 PM11/9/20
to pdf-r...@googlegroups.com
Hi Elliot,

It's hard to know without seeing a sample file. There's cases where
pdf-reader can be improved to handle rare approaches to encoding, and
other cases where there's no way to extract the text.

If you're able to share a file directly to my address, I'm happy to
take a quick look.

James
> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/4d9920c5-8af1-40d4-9967-d331258877a2n%40googlegroups.com.

Alba Hoxha

unread,
Jul 9, 2024, 5:09:06 AM7/9/24
to PDF::Reader
New proxy link 2024 Click now 
Link🔗👉🏻: https://s.id/new-Free-Unblocker
Link🔗👉🏻: https://s.id/NewWorkingProxy2024
Link🔗👉🏻: https://s.id/Best-unbloger-proxy
Link🔗👉🏻: https://s.id/100-Freeunblockers2024
Link🔗👉🏻: https://s.id/2024-best-unbloger


Reply all
Reply to author
Forward
0 new messages