Character repetition

35 views
Skip to first unread message

Abdullah Tariq

unread,
Feb 21, 2016, 11:42:55 PM2/21/16
to PDF::Reader


I am getting character repetition while reading this pdf


require 'pdf-reader'
require 'open-uri'

reader = PDF::Reader.new("Rstr2015_unlocked.pdf")
reader.pages[6..6].each do |page|
puts page.text.to_s
break
end



James Healy

unread,
Feb 22, 2016, 7:21:38 AM2/22/16
to pdf-r...@googlegroups.com
Hi,

Thanks for this detailed report and screenshot.

When I open the PDF in evince (on linux), the page with duplicate letters looks like this:


​I haven't checked the internals of the PDF, but is it possible that some of the letters are being written to the page twice?

lib/pdf/reader/page_layout.rb is responsible for laying all the text out on a page and it currently has no logic for skipping duplicate text written to roughly the same position. I'll happily accept a patch if you're able to come with something.

As a starting point, try this debugging code at the top of the initialize method:

      runs.select { |run|
        run.text == "A"
      }.sort.each { |run|
        puts run.inspect
      }

James

--
You received this message because you are subscribed to the Google Groups "PDF::Reader" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdf-reader+...@googlegroups.com.
To post to this group, send email to pdf-r...@googlegroups.com.
Visit this group at https://groups.google.com/group/pdf-reader.
For more options, visit https://groups.google.com/d/optout.

Jack Rusher

unread,
Feb 23, 2016, 4:33:10 AM2/23/16
to pdf-r...@googlegroups.com
On 22 Feb, 2016, at 13:21, James Healy <ja...@yob.id.au> wrote:
> ​I haven't checked the internals of the PDF, but is it possible that some of the letters are being written to the page twice?

This is a very common thing in older PDFs or PDFs produced by older software, as it was a cheat to make an effect like bold face.


J.
Reply all
Reply to author
Forward
0 new messages