I need to read data from a PDF file using Web Driver.
Let us suppose the PDF contains "User Name", "Address", "Date of Birth"...etc.....
Now I want to fetch that information using Web Driver...
It would be really helpful for me If any one know the solution...
-Vishi
--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To post to this group, send email to seleniu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/selenium-users/-/y7ZwAosJ68oJ.
For more options, visit https://groups.google.com/groups/opt_out.
To view this discussion on the web visit https://groups.google.com/d/msg/selenium-users/-/bgMrEk3Jc1kJ.
import java.io.BufferedInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.concurrent.TimeUnit;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.util.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.Reporter;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class ReadPdfFile {
WebDriver driver;
@BeforeTest
public void setUpDriver() {
driver = new FirefoxDriver();
Reporter.log("I am done");
}
@Test
public void start() throws IOException{
driver.get("http://votigo.com/overview_collateral.pdf");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
URL url = new URL(driver.getCurrentUrl());
BufferedInputStream fileToParse=new BufferedInputStream(url.openStream());
//parse() -- This will parse the stream and populate the COSDocument object.
//COSDocument object -- This is the in-memory representation of the PDF document
PDFParser parser = new PDFParser(fileToParse);
parser.parse();
//getPDDocument() --
This will get the PD document that was parsed. When you are done with this document y
ou must call close() on it to release resources
//PDFTextStripper() --
This class will take a pdf document and strip out all of the text and ignore the formatting
and such.
String output=new PDFTextStripper().getText(parser.getPDDocument());
System.out.println(output);
parser.getPDDocument().close();
driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);
}
}
I would be very surprised if you did because PDF is not an HTML based format:
http://en.wikipedia.org/wiki/Portable_Document_Format#File_structure
XML is easy to pull down through Selenium because the browser renders it in the same way as HTML (Unless of course you are using IE then it adds markup). If your browser has a built in PDF reader that renders the PDF as HTML then it may be possible to pull the source out as a string but then you are not looking at the original PDF source, but whatever the browser converted it into.
If you really want to test it properly download it and MD5 hash it and compare that to an MD5 hash of a known good copy or load it up using a external library like PDFbox. I’m a big believer in using the right tool for the right job and Selenium is most definitely not the right tool for working with PDF files.
import java.io.BufferedInputStream;
import java.io.IOException;
import java.net.URL;
import java.util.concurrent.TimeUnit;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.util.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.Reporter;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class PDFDataRead {
WebDriver driver;
public static String latestwindowid;
@BeforeTest
public void open()
{
driver=new FirefoxDriver();
driver.manage().window().maximize();
Reporter.log("I am done");
}
@AfterTest
public void close()
{
driver.quit();
}
@Test
public void start() throws IOException{
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
URL url = new URL(driver.getCurrentUrl());
BufferedInputStream fileToParse=new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(fileToParse);
parser.parse();
String output=new PDFTextStripper().getText(parser.getPDDocument());
System.out.println(output);
parser.getPDDocument().close();
driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);
}
}
"public static String latestwindowid;"--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To post to this group, send email to seleniu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/8bea0fde-b482-431d-bf30-3b7331864770%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/c24be347-8bff-4864-8302-82e7ae6f55bc%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/f773437c-837e-4829-88d3-658902bd4729%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.