I try to read two columns at one page orderly. First left co. then right col. How can i do this?

37 views
Skip to first unread message

sercan t.

unread,
Nov 17, 2020, 4:44:27 PM11/17/20
to PDF::Reader
Hi everybody.
I have a pdf document which structed like below.

page 1 = |**|**|
page 2 = |**|**|
page 3 = |**|**|
...

How can i get the text of document like left to down then right to down style? 






Wayne Brissette

unread,
Nov 17, 2020, 6:15:32 PM11/17/20
to pdf-r...@googlegroups.com
Do you have a page that a had a table like this that you can share?

Sent from my iPhone

On Nov 17, 2020, at 15:45, sercan t. <tran....@gmail.com> wrote:

Hi everybody.
--
You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pdf-reader/9d569f4e-d372-430a-8a50-8b196db5c80fn%40googlegroups.com.

sercan t.

unread,
Nov 18, 2020, 1:34:09 AM11/18/20
to PDF::Reader
Here is the  test.pdf  example.


18 Kasım 2020 Çarşamba tarihinde saat 02:15:32 UTC+3 itibarıyla wbri...@att.net şunları yazdı:

Wayne Brissette

unread,
Nov 18, 2020, 4:59:39 AM11/18/20
to pdf-r...@googlegroups.com
sercan t. wrote on 2020-11-18 00:32:
Here is the  test.pdf  example.

Thanks. So I had a look at this, it's certainly problematic. There's no way to read the data in a column like that. James touches on this here:

https://groups.google.com/g/pdf-reader/c/fg2RCwqKNu0/m/XkUAsNIkbnEJ

In this case, the delimiter is a space, which you can't really have.

The problem really is there's nothing in the PDF spec that says, here's a column, instead everything is done based on coordinates. James does an outstanding job in keeping this stuff together and presenting you with something reasonable.

The closest thing I could get with that test PDF was to convert it to an HTML page, then use Nokogiri with xpath to put the columns into different arrays. Once that's done, you could do what you want with the data. However, there's no automatic export to HTML out of pdf-reader (I manually did that for my test).

-Wayne

sercan t.

unread,
Nov 18, 2020, 7:41:42 AM11/18/20
to PDF::Reader
Thanks for help. I'll try to solve this. If i will success i'll write here.

18 Kasım 2020 Çarşamba tarihinde saat 12:59:39 UTC+3 itibarıyla wbri...@att.net şunları yazdı:
Reply all
Reply to author
Forward
0 new messages