Converting from PDF to SVG/Text/Image in a Ruby web service

608 views
Skip to first unread message

Support

unread,
Jan 25, 2012, 7:14:22 PM1/25/12
to pdfne...@googlegroups.com
Q: We need to be able to extract the text of the page, a PNG representation, a SVG representation and a PDF representation of every page in a given PDF document using Ruby. 
 
I see lots of sample code for various scenarios using Ruby, but the specific PDF2SVG and PDF2Image sections don't seem to cover Ruby samples.  
 

Also, they need this to read in a file from Amazon S3 and process it in memory.  Will that even be possible using code from your PDFDocMemory sample?  Perhaps something like this:

 

        PDFNet.Initialize

 

        # Read a PDF document in a memory buffer.

        file = StdFile.new((url_to_document_on_amazon), StdFile::E_read_mode)

        file_sz = file.FileSize

 

        file_reader = FilterReader.new(file)

 

        mem = file_reader.Read(file_sz)

        doc = PDFDoc.new(mem, file_sz)

        doc.InitSecurityHandler
 
 
-----------
 
A:
 

>        # Read a PDF document in a memory buffer.

>        file = StdFile.new((url_to_document_on_amazon), >StdFile::E_read_mode)

 

This most likely won’t work for downloading the data from an online source. At this point you will need to use a Ruby specific API to download the document  into the memory buffer, and then use the buffer to create a PDFDoc. One way to implement this is as follows:

 

1.       Use a technique similar to PDFDocMemoryTest  sample (http://www.pdftron.com/pdfnet/samplecode/PDFDocMemoryTest.rb) to create a PDFDoc from a memory buffer.

2.       Call Convert::ToSVG on the document to convert to PDFDoc to SVG. (there is a somewhat less simple sample in http://www.pdftron.com/pdfnet/samplecode/ConvertTest.rb)

3.       Use PDFDraw as in the PDFDraw sample (http://www.pdftron.com/pdfnet/samplecode/PDFDrawTest.rb) to create PNG files for each page. (iterate through the pages as in example 2, but omit the “JPEG” and encoder_param arguments to output PNG)

 

 

Support

unread,
Jan 26, 2012, 6:40:50 PM1/26/12
to pdfne...@googlegroups.com

Q: I can't seem to get this library to load correctly in a Rails 3.1 app.  Do you know what the correct procedure is with Rails?  Here is what I did so far:

 

1. Placed the PDFNetRuby.so file into the vendor/lib directory (I'm on a mac and the production server is Linux so it looks like I might need to be using libPDFNetC for development.  Is that correct, or can I use PDFNetRuby for both?)

2. I then added the vendor/lib directory to the config.auto_load_paths: config.autoload_paths += %W(#{config.root}/vendor/lib)

3. I then added a config/initializers file called pdf_net.rb with the require statement: require 'PDFNetRuby'

4. When I try to load the console with "rails console" it bombs with: `require': no such file to load -- PDFNetRuby (LoadError)

 

Is there something I am missing here?  I also have the: include PDFNetRuby in my class file for processing the PDF but that will fail too without the require statement working.
 
------------
 
A:  After extracting PDFNet SDK for Mac, did you read ‘readme.txt’ and run the install script:

 

sh setup.sh

 

? Also, were you able to run included samples?

 

null-002500431691:Samples user$ sh runall_ruby.sh 

 

If, at this point, you get the following

 

../../../Lib/PDFNetRuby.bundle: dlopen(../../../Lib/PDFNetRuby.bundle, 9):

Library not loaded: /usr/local/lib/libruby.1.9.1.dylib (LoadError)

 

you need to copy libruby.1.9.1.dylib to /usr/lib/? This file should be included with your RVM Ruby installation. By default, it should be pathToRVM/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.1.9.1.dylib. You might not be able to see this directory in the finder, in this case you can do the copying in the terminal. The following is the command:

sudo cp ..../.rvm/rubies/ruby-1.9.2-p290/lib/libruby.1.9.1.dylib /usr/lib/

Reply all
Reply to author
Forward
0 new messages