Package: poppler-utils
Version: 0.12.4-1.2
Severity: normal
When I convert
<URL:
http://nrk.no/contentfile/file/1.8116520!offentligjournal02052012.pdf >
to XML using
pdftohtml -xml -noframes 1.8116520\!offentligjournal02052012.pdf
I get the following content-less XML file. I find this rather strange,
as the PDF is searchable using xpdf, okular and evince. Any idea where
the text went? Anything I can do to get access to the text as XML?
This is the output I get:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="792" width="612">
<fontspec id="0" size="18" family="Helvetica" color="#000000"/>
<fontspec id="1" size="5" family="Helvetica" color="#000000"/>
<fontspec id="2" size="5" family="Helvetica" color="#000000"/>
<fontspec id="3" size="7" family="Helvetica" color="#000000"/>
</page>
<page number="2" position="absolute" top="0" left="0" height="792" width="612">
<fontspec id="4" size="6" family="Helvetica" color="#000000"/>
</page>
<page number="3" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="4" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="5" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="6" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="7" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="8" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="9" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="10" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="11" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="12" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="13" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="14" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="15" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="16" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="17" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="18" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="19" position="absolute" top="0" left="0" height="792" width="612">
</page>
<page number="20" position="absolute" top="0" left="0" height="792" width="612">
</page>
</pdf2xml>
-- System Information:
Debian Release: 6.0.5
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core)
Locale: LANG=nb_NO.UTF-8, LC_CTYPE=nb_NO.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages poppler-utils depends on:
ii libc6 2.11.3-3 Embedded GNU C Library: Shared lib
ii libfontconfig1 2.8.0-2.1 generic font configuration library
ii libgcc1 1:4.4.5-8 GCC support library
ii libpoppler5 0.12.4-1.2 PDF rendering library
ii libstdc++6 4.4.5-8 The GNU Standard C++ Library v3
ii libxml2 2.7.8.dfsg-2+squeeze4 GNOME XML library
Versions of packages poppler-utils recommends:
ii ghostscript 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF
poppler-utils suggests no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to
debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact
listm...@lists.debian.org