Using XPATH to read PDF elements in Firefox

502 views
Skip to first unread message

Bala Prasanna

unread,
Apr 18, 2019, 10:59:04 AM4/18/19
to Selenium Users
Am trying to read content of PDF using XPATH. Below mentioned is the PDF's tree structure. I could access till "<div id="viewer" class="pdfViewer">" 
but could not access "<div class="page" "
Any suggestions will help

PDF Stucture:
<html dir="ltr" mozdisallowselectionprint=""><head>

<!-- This snippet is used in the Firefox extension (included from viewer.html) -->
<base href="resource://pdf.js/web/">
<script src="../build/pdf.js"></script>
    <link rel="stylesheet" href="viewer.css">



    <script src="viewer.js"></script>

  </head>
  <body tabindex="1" class="">
    <div id="outerContainer">
<div id="mainContainer">
<div id="viewerContainer" tabindex="0">
<div id="viewer" class="pdfViewer">
<div class="page" style="width: 612px; height: 792px;" data-page-number="1" data-loaded="true">
                                        .......
</div>
</div>
</div>
</div>
    </div> <!-- outerContainer -->

XPATH
String path1 = "/html/body/div[@id='outerContainer']/div[@id='mainContainer']/div[@id='viewerContainer']/div[@id='viewer']/div[@class='.page']";

Error Message: 

[Child 9404, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/srException in thread "main" org.openqa.selenium.NoSuchElementException: Unable to locate element: /html/body/div[@id='outerContainer']/div[@id='mainContainer']/div[@id='viewerContainer']/div[@id='viewer']/div[@class='.page']
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html

Joe Ward

unread,
Apr 18, 2019, 12:07:07 PM4/18/19
to seleniu...@googlegroups.com
What you’re attempting to do is not possible with Selenium as far as I’m aware. 

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To post to this group, send email to seleniu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/1a399e86-8267-4668-8aa0-e5d03c59f93a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Muhad Sayef

unread,
Apr 18, 2019, 9:00:26 PM4/18/19
to seleniu...@googlegroups.com
Can you provide similar UI screenshot?

One suggestion get the window midpoint if pdf point send mouse click. Then send keystrokes combinations ctrl+A, then send keystrokes combinations ctrl+c.

You will have the pdf content in your clipboard.
Now you can do string manipulation to get the desired stuff.

Hope it helps...:)

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To post to this group, send email to seleniu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/1a399e86-8267-4668-8aa0-e5d03c59f93a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sincerely,
Muhad Sayef

Mike Hetzer

unread,
Apr 19, 2019, 1:24:32 PM4/19/19
to Selenium Users
Yeah, you won't be able to inspect anything on a PDF directly with any UI automation tool.

Alternatively, you can use automation to download the documents and you can convert and parse the contents with something like PDFBox.
To unsubscribe from this group and stop receiving emails from it, send an email to seleniu...@googlegroups.com.

Bala Prasanna

unread,
Apr 21, 2019, 7:55:04 AM4/21/19
to Selenium Users
/html/body/div[1]/div[2]/div[4]/div/div[1]/div[2]/span[2]

Hi,

Am using selenium to automate PDF verification. Am launching PDF in Firefox and using XPATH am identifying each element's text and verifying it against expected value. I could get XPATH from Firefox inspection but could not access. PFB for more details


Selenium version:
selenium-server-standalone-3.9.1
selenium-server-3.9.1
geckodriver-v0.24.0-win64
Firefox 66.0.3 (64 bit)



xpath: /html/body/div[1]/div[2]/div[4]/div/div[1]/div[2]/span[2]

Error:

1555846788724 mozrunner::runner INFO Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-foreground" "-no-remote" "-profile" "C:\\Users\\User\\AppData\\Local\\Temp\\rust_mozprofile.qDB2b3DlvHFn"
1555846789841 addons.webexten...@mozilla.org WARN Loading extension 'scree...@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
1555846789843 addons.webexten...@mozilla.org WARN Loading extension 'scree...@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
1555846789843 addons.webexten...@mozilla.org WARN Loading extension 'scree...@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
1555846792579 Marionette INFO Listening on port 50319
1555846793075 Marionette WARN TLS certificate errors will be ignored for this session
Apr 21, 2019 7:39:53 AM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
[Parent 12768, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 332
Exception in thread "main" org.openqa.selenium.NoSuchElementException: Unable to locate element: /html/body/div[1]/div[2]/div[4]/div/div[1]/div[2]/span[2]
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: '3.9.1', revision: '63f7b50', time: '2018-02-07T22:42:28.403Z'
System info: host: 'BALA', ip: '172.20.20.20', os.name: 'Windows 8.1', os.arch: 'amd64', os.version: '6.3', java.version: '1.8.0_161'
Driver info: org.openqa.selenium.firefox.FirefoxDriver
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 66.0.3, javascriptEnabled: true, moz:accessibilityChecks: false, moz:geckodriverVersion: 0.24.0, moz:headless: false, moz:processID: 12768, moz:profile: C:\Users\User\AppData\Local..., moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: WINDOWS, platformName: WINDOWS, platformVersion: 6.3, rotatable: false, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: fb5902b1-0233-4188-a026-4650c0dcc111
*** Element info: {Using=xpath, value=/html/body/div[1]/div[2]/div[4]/div/div[1]/div[2]/span[2]}
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException(W3CHttpResponseCodec.java:187)
at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:122)
at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:49)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:160)
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:601)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:371)
at org.openqa.selenium.remote.RemoteWebDriver.findElementByXPath(RemoteWebDriver.java:473)
at org.openqa.selenium.By$ByXPath.findElement(By.java:361)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:363)
at ReadData.invokeBrowser_F(ReadData.java:26)
at ReadData.main(ReadData.java:37)

Joe Ward

unread,
Apr 21, 2019, 9:32:33 AM4/21/19
to seleniu...@googlegroups.com
Once again: Selenium is unable to do what you are trying to use it for. The Workfusion URL you posted is nothing to do with Selenium.

It won't work.

To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.

To post to this group, send email to seleniu...@googlegroups.com.

Michael Hwee

unread,
Apr 24, 2019, 11:06:17 AM4/24/19
to seleniu...@googlegroups.com
You have typo in xpath, extra dot in '.page'.


--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to selenium-user...@googlegroups.com.
To post to this group, send email to seleniu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages