'Frame not found in cellmap' error; incorrect PDF display when HTML is dumbed down to avoid error

5,061 views
Skip to first unread message

Matt Miller

unread,
Jan 16, 2012, 4:13:37 PM1/16/12
to dom...@googlegroups.com
Hello,

I've just started a project where I'm attempting to convert HTML to PDF and was very happy to discover dompdf.  It really seems like an amazing piece of software, but I'm struggling to get it to work in my case.  I would love to get some help/advice from anyone willing

Here's the php code I have:
<?php
require_once 'dompdf/dompdf_config.inc.php';
$dompdf = new DOMPDF();
$html = <<<EOT
HTML HERE
EOT;
$dompdf->load_html($html);
$dompdf->set_paper('letter', 'portrait');
$dompdf->render();
$dompdf->stream("dompdf_out.pdf", array("Attachment" => false));

The HTML HERE is replaced with the attached HTML files.  The full version is AnkleAnatomy.htm.  When it's put in for the $html variable, it throws the following error:
Fatal error: Uncaught exception 'DOMPDF_Exception' with message 'Frame not found in cellmap'

which seems to be fairly common and is caused by some tags.  I found a few posts about removing <tbody> and <strong> tags.  So I changed most everything to <span> tags but I still get the error.  Then I decided to reduce it down to something very simple and that's the simplified_version.html file.  That one will actually display, but for some reason it displays multiple pages (can be seen in the render_pdf_from_html.pdf file).  Any idea why?

One other thing while I'm here...while writing this post, I realized that the path to the background image defined in the CSS was wrong in the html files above.  When I changed it to the correct path, dompdf seems to pull in the image and save it to /tmp/rbg_dompdf_img_RANDOM.png, but then dompdf can't seem to find it.  I get this error:
Warning: getimagesize() [function.getimagesize]: Filename cannot be empty in dompdf/include/functions.inc.php on line 672
Warning: file_get_contents() [function.file-get-contents]: Filename cannot be empty in dompdf/include/functions.inc.php on line 675
Fatal error: Uncaught exception 'PDFlibException' with message 'Parameter 'type' is empty' 

Any idea why the image filename wouldn't be passed into the include/functions.inc.php file?

Thanks in advance for any help.
Matt

AnkleAnatomy.htm
simplified_version.html
render_pdf_from_html.pdf

ton ramirez

unread,
Jan 18, 2012, 2:02:12 PM1/18/12
to dom...@googlegroups.com
Mike I found a tutorial online that helped some people
hope this helps:
http://coreyworrell.com/blog/article/php-html-email-pdf-attachment

you don't have to hard code the html!! you can point to the file containing the html it's easier!


Matt

--
You received this message because you are subscribed to the Google Groups "dompdf" group.
To view this discussion on the web visit https://groups.google.com/d/msg/dompdf/-/4B4EBZoAVP8J.
To post to this group, send email to dom...@googlegroups.com.
To unsubscribe from this group, send email to dompdf+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dompdf?hl=en.

BrianS

unread,
Jan 18, 2012, 2:34:14 PM1/18/12
to dom...@googlegroups.com
On Monday, January 16, 2012 4:13:37 PM UTC-5, Matt Miller wrote:
I've just started a project where I'm attempting to convert HTML to PDF and was very happy to discover dompdf.  It really seems like an amazing piece of software, but I'm struggling to get it to work in my case.  I would love to get some help/advice from anyone willing
...
The HTML HERE is replaced with the attached HTML files.  The full version is AnkleAnatomy.htm.  When it's put in for the $html variable, it throws the following error:
Fatal error: Uncaught exception 'DOMPDF_Exception' with message 'Frame not found in cellmap'

Typically this is caused by a table flowing to a new page incorrectly. This problem was more prevalent on 0.5.2 than it is in 0.6.0. However, in testing your code the problem is still encountered with the latest code.
 
which seems to be fairly common and is caused by some tags.  I found a few posts about removing <tbody> and <strong> tags.  So I changed most everything to <span> tags but I still get the error.  Then I decided to reduce it down to something very simple and that's the simplified_version.html file.  That one will actually display, but for some reason it displays multiple pages (can be seen in the render_pdf_from_html.pdf file).  Any idea why?

On an initial look at your HTML I can say that one of the problems is that your defined page size in the document is larger than that available. For the sake of discussion you can assume that dompdf treats pixes and points as equivalent units of measurement. You're rendering to a letter-size document which has dimensions of 612x792. When including margins, padding, etc I believe your content is larger than this.

Also, your CSS seems to be needlessly complex. There's a lot of declarations (such as position: relative) that don't seem to be necessary. You could simplify debugging a bit by removing unnecessary declarations and combining shared declarations into a common class used by multiple elements.

Start with those things in mind and see where you get. If you need more direct help post back and I'll modify the simplified document to show you what I mean.

 
One other thing while I'm here...while writing this post, I realized that the path to the background image defined in the CSS was wrong in the html files above.  When I changed it to the correct path, dompdf seems to pull in the image and save it to /tmp/rbg_dompdf_img_RANDOM.png, but then dompdf can't seem to find it.  I get this error:
Warning: getimagesize() [function.getimagesize]: Filename cannot be empty in dompdf/include/functions.inc.php on line 672
Warning: file_get_contents() [function.file-get-contents]: Filename cannot be empty in dompdf/include/functions.inc.php on line 675
Fatal error: Uncaught exception 'PDFlibException' with message 'Parameter 'type' is empty' 

Any idea why the image filename wouldn't be passed into the include/functions.inc.php file?

Somewhere along the line something is happening that's resulting in an empty empty/null filename. We'll have to take a closer look to see if the problem is caused by something in the HTML or by the server configuration.

Matt Miller

unread,
Jan 18, 2012, 5:43:12 PM1/18/12
to dompdf
> Typically this is caused by a table flowing to a new page incorrectly. This
> problem was more prevalent on 0.5.2 than it is in 0.6.0. However, in
> testing your code the problem is still encountered with the latest code.

OK, thanks.

> On an initial look at your HTML I can say that one of the problems is that
> your defined page size in the document is larger than that available. For
> the sake of discussion you can assume that dompdf treats pixes and points
> as equivalent units of measurement. You're rendering to a letter-size
> document which has dimensions of 612x792. When including margins, padding,
> etc I believe your content is larger than this.

Ah, I didn't know this and it's probably the reason for all of the
errors I'm getting.

> Also, your CSS seems to be needlessly complex. There's a lot of
> declarations (such as position: relative) that don't seem to be necessary.

The HTML was autogenerated. I'm actually starting with a PDF,
converting it to HTML, allowing people to modify it and then
converting it back to PDF with dompdf. Unfortunately I don't have
good HTML to start with because all of the things I'm trying to
convert are already in PDF format.

Not that it's all that relevant to this post or my problems, but do
you know of any tools that convert between PDF and HTML and do it
well?

> You could simplify debugging a bit by removing unnecessary declarations and
> combining shared declarations into a common class used by multiple elements.

Right.

> Start with those things in mind and see where you get. If you need more
> direct help post back and I'll modify the simplified document to show you
> what I mean.

OK, thanks for that.


> Somewhere along the line something is happening that's resulting in an
> empty empty/null filename. We'll have to take a closer look to see if the
> problem is caused by something in the HTML or by the server configuration.

I took a bit of a closer look at this today, digging into the code.
Looking in the file dompdf/include/pdflib_adapter.cls.php I see that
the image type is being set on line 721: $img_type =
Image_Cache::detect_type($img_url);
which ultimately ends up calling this function in dompdf/include/
functions.inc.php:
function dompdf_getimagesize($filename) {
static $cache = array();

if ( isset($cache[$filename]) ) {
return $cache[$filename];
}

list($width, $height, $type) = getimagesize($filename);

if ( $width == null || $height == null ) {
$data = file_get_contents($filename, null, null, 0, 26);

if ( substr($data, 0, 2) === "BM" ) {
$meta = unpack('vtype/Vfilesize/Vreserved/Voffset/Vheadersize/
Vwidth/Vheight', $data);
$width = (int)$meta['width'];
$height = (int)$meta['height'];
$type = IMAGETYPE_BMP;
}
}

return $cache[$filename] = array($width, $height, $type);
}

Any idea why the $type is returning empty? Could it have something to
do with permissions on the /tmp directory?

BrianS

unread,
Jan 19, 2012, 2:22:42 PM1/19/12
to dom...@googlegroups.com
On Wednesday, January 18, 2012 5:43:12 PM UTC-5, Matt Miller wrote:
> Also, your CSS seems to be needlessly complex. There's a lot of
> declarations (such as position: relative) that don't seem to be necessary.

The HTML was autogenerated.  I'm actually starting with a PDF,
converting it to HTML, allowing people to modify it and then
converting it back to PDF with dompdf.  Unfortunately I don't have
good HTML to start with because all of the things I'm trying to
convert are already in PDF format.

Not that it's all that relevant to this post or my problems, but do
you know of any tools that convert between PDF and HTML and do it
well?

That would certainly make things a bit more difficult. I don't really know of a good PDF to HTML generator. I don't doubt the possibility that one exists, but I've never had much need for one. The biggest issue is that a PDF is internally structured for print, so there is not necessarily any relationship indicated between various content pieces (boxes and contained text, or even text within the same paragraph). The parser would have to be fairly complex to be able to discern relationships based on coordinates on a page or sentence structure, for example.
 
> You could simplify debugging a bit by removing unnecessary declarations and
> combining shared declarations into a common class used by multiple elements.

Right.

> Start with those things in mind and see where you get. If you need more
> direct help post back and I'll modify the simplified document to show you
> what I mean.

OK, thanks for that.

I know this is a bit less helpful since you're converting to HTML from PDF. You might try running your document through tidy to have it clean things up as much as possible.

The real problem is that dompdf has problems with some elements that are larger than a page and this is something we hope to address.

 

Actually, it's because a variable reference in Image_Cache::detect_type() is incorrect. The function is currently written as:

static function detect_type($file) {
  list($width, $height, $type) = dompdf_getimagesize($img);
  return $type;
}

but it should be (change highlighted):

static function detect_type($file) {
  list($width, $height, $type) = dompdf_getimagesize($file);
  return $type;
}

If you modify that in your dompdf install the function should work as expected. I'll also update the trunk in the repository, if you want to download a fresh copy.

FYI, the reason this wasn't caught before now is that most of our users rely on the CPDF rendering library, which does not use this function. It's good to have a few PDFLib users to help us catch bugs like this.

Matt Miller

unread,
Jan 19, 2012, 6:29:23 PM1/19/12
to dompdf
> That would certainly make things a bit more difficult. I don't really know
> of a good PDF to HTML generator. I don't doubt the possibility that one
> exists, but I've never had much need for one. The biggest issue is that a
> PDF is internally structured for print, so there is not necessarily any
> relationship indicated between various content pieces (boxes and contained
> text, or even text within the same paragraph). The parser would have to be
> fairly complex to be able to discern relationships based on coordinates on
> a page or sentence structure, for example.

Right. This is the one I used:
http://www.pdfonline.com/convert-pdf-to-html/default.aspx

It worked very well in terms of making the HTML look just like the PDF
in a browser. Evidently that's not enough to make it portable into
dompdf though...

> I know this is a bit less helpful since you're converting to HTML from PDF.
> You might try running your document through tidy to have it clean things up
> as much as possible.

Yeah, I read somewhere else about using tidy and I gave it a try. It
didn't seem to catch many issues.

> The real problem is that dompdf has problems with some elements that are
> larger than a page and this is something we hope to address.

OK.

> Actually, it's because a variable reference in Image_Cache::detect_type()
> is incorrect. The function is currently written as:
>
> static function detect_type($file) {
>   list($width, $height, $type) = dompdf_getimagesize($img);
>   return $type;
>
> }
>
> but it should be (change highlighted):
>
> static function detect_type($file) {
>   list($width, $height, $type) = dompdf_getimagesize($file);
>   return $type;
>
> }

Oh, yep, that would do it!

I'm not sure now that dompdf is going to be the way to go with this
project. Converting the PDFs to HTML and then having to manually
clean the HTML for each PDF is going to be way too time consuming (we
have 250+ PDFs). So I started searching some more today for other
solutions and found PDFescape:
http://www.pdfescape.com/

It's an extremely useful tool and does exactly what we need. It's
basically an AJAX WYSIWYG interface with the ability to upload images,
insert them into the PDF, delete text, add new text, etc.
Additionally, you can license it for use on your own server:
http://www.pdfescape.com/webmasters/

The only problem is that it runs in ASP.NET and we only run PHP
configurations. Darn it! And unfortunately I couldn't find an
equivalent product available in PHP. Not sure what to do next...
Reply all
Reply to author
Forward
0 new messages