Usable position data for text elements in a PDF

Skip to first unread message

Xavier Hocquet

Mar 9, 2021, 5:28:25 PM3/9/21
to PDF::Reader

Hey there, love the library!

I am exploring some options with needing to get positional data of some text in a PDF file. For example, in a bank statement, given the name of the person, I would like coordinates where their name appears.

By printing the page and page objects in my console, I am able to visually see what seems like the right data. For example, I can see structures like this that are within the cache (I think?) -

tokens-3d487a0f0011d31879cb978a46d4c268=>["BT", :F3, 9, "Tf", 9, "TL", 122.4, 782.64, "Td", "Ending Balance", "Tj", "ET"]

If I'm not mistaken, that 122.4 and 782.64 look like X/Y coordinates for the start of the text.

I'm looking for a small example on how to access these structures in some fashion knowing that I am only interested in text objects. Ideally, I would like an array of objects like such -

  value: "TEXT HERE",
  x: 123,
  y: 789

Could you please provide a small example of how you would go about this? I have poked around the source for a while but I'll admit I'm a bit stumped due to the abstract nature of it all!

Thank you!

James Healy

Mar 11, 2021, 6:47:36 AM3/11/21
Hi Xavier,

Unfortunately there isn't a public API in pdf-reader that can output
text with position annotations. I'd be very open to including it,
however I'm fairly short on time personally.

The core data is all there, and here's a hack that exposes it:

To use it you'd have to fork pdf-reader and apply the patch. If you do
so and find it useful, I'd be happy to accept a pull request with a
tidied up version of the change. In particular, it'd be good to avoid
using instance_variable_get and to have some integration specs in

> --
> You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To view this discussion on the web visit
Reply all
Reply to author
0 new messages