I'm trying to write a function that takes a string of html without
line breaks or indenting, and adds line breaks and formatting.
So this:
"<html><head><title>Welcome</title><link rel='stylesheet' type='text/
css' href='default.css'></head><body>"
becomes this:
"<html>
<head>
<title>
Welcome
</title>
<link rel='stylesheet' type='text/css' href='default.css'>
</head>
<body>
"
This is my code:
function indent_html( $html, $indent_char = "\t", $newline_char =
"\n" ) {
// split the html into an array of character
$html = str_split( $html );
$tag_depth = 0;
foreach ( $html as $index => $char ) {
//////////////////////////////////////////
// do any indent number calculation, etc.
//////////////////////////////////////////
$last_index = $index - 1;
$next_index = $index + 1;
$last_char = $html[$last_index];
$next_char = $html[$next_index];
if ( $char == "<" ) {
if ( $next_char == "/" ) {
$tag_depth--;
} else {
$tag_depth++;
}
for ( $i = 1; $i <= $tag_depth; $i++ ) {
$output .= $indent_char;
}
}
//////////////////////////
$output .= $char;
/////////////////////////////////////////
// Do stuff after adding the character
////////////////////////////////////////
if ( $char == ">" ) {
$output .= $newline_char;
}
}
return $output;
}
and here's an example result
<html>
<head>
<title>
Welcome </title>
<link rel='stylesheet' type='text/css' href='default.css'>
</head>
<body>
<table width='100%'>
<tr>
<td align='center' >
<table width='800px' height='600px' bgcolor='#9A3D24'
cellspacing='10'>
<tr>
<td bgcolor='#CFE673' align='center'>
This is a table cell </td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
-------------------------------------------------
So you see that I am having a problem inserting indentation on a non-
tag string, and also adding a line break after the non-tag string.
I'm using the str_split method to break up the complete html string
into an array of characters, and then deal with each character ( and
the last, current, and next ) in each term.
Is there a way that I could split the string into an array of more
meaningful chunks, such as
array => (
[0] => '<html>',
[1] => '<head>',
[2] => '<title>',
[3] => 'Welcome',
[4] => '</title>',
[5] => '<link rel='stylesheet' type='text/css' href='default.css'>',
[6] => '</head>',
)
How might I do this?
>I'm trying to write a function that takes a string of html without
>line breaks or indenting, and adds line breaks and formatting.
What about trying one of the XML extensions to get a formatted output?
Or try the Tidy extension if available.
Micha
> Hello all -
>
> I'm trying to write a function that takes a string of html without
> line breaks or indenting, and adds line breaks and formatting.
>
> So this:
> "<html><head><title>Welcome</title><link rel='stylesheet' type='text/
> css' href='default.css'></head><body>"
>
> becomes this:
> "<html>
> <head>
> <title>
> Welcome
> </title>
> <link rel='stylesheet' type='text/css' href='default.css'>
> </head>
> <body>
> "
Aside from the code: various UA quircks rendering whitespace when they
shouldn't (in lists for example) mean you could break a carefully laid out
design like this.
About the code, could this be to your liking?
<?php
$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->loadHTML("<html><head><title>Welcome</title><link rel='stylesheet'
type='text/css'
href='default.css'></head><body><h1>foo</h1></body></html");
echo $doc->saveXML($doc->getElementsByTagName('html')->item(0));
?>
> array => (
> [0] => '<html>',
> [1] => '<head>',
> [2] => '<title>',
> [3] => 'Welcome',
> [4] => '</title>',
> [5] => '<link rel='stylesheet' type='text/css' href='default.css'>',
> [6] => '</head>',
> )
>
> How might I do this?
preg_split('/(<[^>]*>)/',$string,-1,PREG_SPLIT_DELIM_CAPTURE |
PREG_SPLIT_NO_EMPTY));
--
Rik Wasmus
What does UA stand for? Can you give an example of whitespace that
would cause this to break? I only plan on using this internally; I
don't plan on using whitespace to format.
>
> About the code, could this be to your liking?
>
> <?php
> $doc = new DOMDocument();
> $doc->formatOutput = true;
> $doc->loadHTML("<html><head><title>Welcome</title><link rel='stylesheet'
> type='text/css'
> href='default.css'></head><body><h1>foo</h1></body></html");
> echo $doc->saveXML($doc->getElementsByTagName('html')->item(0));
> ?>
That's it exactly! Thanks!
> > How might I do this?
>
> preg_split('/(<[^>]*>)/',$string,-1,PREG_SPLIT_DELIM_CAPTURE |
> PREG_SPLIT_NO_EMPTY));
Also wonderful! Thanks again!
> On Dec 11, 10:31 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Dec 2007 16:58:10 +0100, <lawp...@gmail.com> wrote:
>>
>> Aside from the code: various UA quircks rendering whitespace when they
>> shouldn't (in lists for example) mean you could break a carefully laid
>> out
>> design like this.
>
> What does UA stand for?
User-Agent, normally a browser, not neccessarily though.
> Can you give an example of whitespace that
> would cause this to break?
This one for instance:
http://www.hicksdesign.co.uk/journal/546/ie-whitespace-bug
> I only plan on using this internally; I
> don't plan on using whitespace to format.
As long as you don't use it for output, it shouldn't be a problem.
> That's it exactly! Thanks!
You're welcome.
--
Rik Wasmus