Before you get too far down this path I'll just remind you that it doesn't hurt to also check out other libraries. We'll miss you, but totally understand.
Your document is fairly consistent in it's structure. The most straightforward way to split your document apart is to loop through it line by line to grab the various parts.
$lines = explode('\n', $html);
$sections = array();
$section_num = 0;
foreach ($lines as $line) {
// start a new page when we reach lines with the following text in them
if (strpos($line,'category-family') !== false) {
$section_num++;
$sections[$section_num] = '';
}
$section[$section_num].="\r\n".trim($line);
// the first content section (the TOC) doesn't start with anything specific, so we'll end it at a known point
if (strpos($line,'col-main') !== false) {
$section_num++;
$sections[$section_num] = '';}
}
}
unset($lines);
// Using PHP 5.3+? You can force garbage collection here to try and free up some memory (though at the cost of run time). Call gc_collect_cycles().
// $sections[0] is the start of your HTML (the HTML head plus starting BODY content)
// $sections[1] is your TOC
// all other indices are your document contents
// now render each section
// render $section[1] last, since it needs to have the page numbers filled out
for ($section_num = 2; $section_num < count($sections); $section_num++) {
$dompdf = new DOMPDF;
$dompdf->load_html($sections[0].$sections[$section_num]);
$dompdf->render();
file_put_contents('somefile.sec'.$section_num.'.pdf',$dompdf->output());
unset($dompdf);
// Using PHP 5.3+? You can force garbage collection here to try and free up some memory (though at the cost of run time). Call gc_collect_cycles().
}
$dompdf = new DOMPDF;
$dompdf->load_html($sections[0].$sections[1]);
$dompdf->render();
file_put_contents('somefile.sec1.pdf',$dompdf->output());
unset($dompdf);
// finally, join all the parts
$exec_cmd = 'pdftk';
for ($section_num = 1; $section_num < count($sections); $section_num++) {
$exec_cmd .= ' ' .'somefile.sec'.$section_num.'.pdf';
}
$exec_cmd .= ' cat output ' . 'somefile.pdf';
exec($exec_cmd, $pdftk_output, $pdftk_return_code);
This is ugly, and I haven't tested or even read carefully what I just wrote so it may be buggy. The HTML will also be a little off, but I think it should still render ok. Also, you may have to adjust the code to take into account your knowledge of the document structure. Finally, doing things this way will mean that each section will start on a new page so you have to decide if that's ok.
This is just one possible method. You could also use regular expressions to parse the content or even load your document into a DOM and use that to parse it. You'll have to figure out what way is easiest for you to work with.