Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

indenting & formating html function

0 views
Skip to first unread message

law...@gmail.com

unread,
Dec 11, 2007, 10:58:10 AM12/11/07
to
Hello all -

I'm trying to write a function that takes a string of html without
line breaks or indenting, and adds line breaks and formatting.

So this:
"<html><head><title>Welcome</title><link rel='stylesheet' type='text/
css' href='default.css'></head><body>"

becomes this:
"<html>
<head>
<title>
Welcome
</title>
<link rel='stylesheet' type='text/css' href='default.css'>
</head>
<body>
"

This is my code:

function indent_html( $html, $indent_char = "\t", $newline_char =
"\n" ) {

// split the html into an array of character
$html = str_split( $html );

$tag_depth = 0;

foreach ( $html as $index => $char ) {

//////////////////////////////////////////
// do any indent number calculation, etc.
//////////////////////////////////////////

$last_index = $index - 1;

$next_index = $index + 1;

$last_char = $html[$last_index];

$next_char = $html[$next_index];

if ( $char == "<" ) {

if ( $next_char == "/" ) {
$tag_depth--;
} else {
$tag_depth++;
}

for ( $i = 1; $i <= $tag_depth; $i++ ) {
$output .= $indent_char;
}
}

//////////////////////////
$output .= $char;

/////////////////////////////////////////
// Do stuff after adding the character
////////////////////////////////////////

if ( $char == ">" ) {

$output .= $newline_char;

}
}

return $output;
}

and here's an example result

<html>
<head>
<title>
Welcome </title>
<link rel='stylesheet' type='text/css' href='default.css'>
</head>
<body>
<table width='100%'>
<tr>

<td align='center' >
<table width='800px' height='600px' bgcolor='#9A3D24'
cellspacing='10'>
<tr>
<td bgcolor='#CFE673' align='center'>
This is a table cell </td>
</tr>
</table>
</td>
</tr>

</table>
</body>
</html>

-------------------------------------------------
So you see that I am having a problem inserting indentation on a non-
tag string, and also adding a line break after the non-tag string.

I'm using the str_split method to break up the complete html string
into an array of characters, and then deal with each character ( and
the last, current, and next ) in each term.

Is there a way that I could split the string into an array of more
meaningful chunks, such as

array => (
[0] => '<html>',
[1] => '<head>',
[2] => '<title>',
[3] => 'Welcome',
[4] => '</title>',
[5] => '<link rel='stylesheet' type='text/css' href='default.css'>',
[6] => '</head>',
)

How might I do this?

Michael Fesser

unread,
Dec 11, 2007, 11:02:35 AM12/11/07
to
.oO(law...@gmail.com)

>I'm trying to write a function that takes a string of html without
>line breaks or indenting, and adds line breaks and formatting.

What about trying one of the XML extensions to get a formatted output?
Or try the Tidy extension if available.

Micha

Rik Wasmus

unread,
Dec 11, 2007, 11:31:10 AM12/11/07
to
On Tue, 11 Dec 2007 16:58:10 +0100, <law...@gmail.com> wrote:

> Hello all -
>
> I'm trying to write a function that takes a string of html without
> line breaks or indenting, and adds line breaks and formatting.
>
> So this:
> "<html><head><title>Welcome</title><link rel='stylesheet' type='text/
> css' href='default.css'></head><body>"
>
> becomes this:
> "<html>
> <head>
> <title>
> Welcome
> </title>
> <link rel='stylesheet' type='text/css' href='default.css'>
> </head>
> <body>
> "

Aside from the code: various UA quircks rendering whitespace when they
shouldn't (in lists for example) mean you could break a carefully laid out
design like this.

About the code, could this be to your liking?

<?php
$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->loadHTML("<html><head><title>Welcome</title><link rel='stylesheet'
type='text/css'
href='default.css'></head><body><h1>foo</h1></body></html");
echo $doc->saveXML($doc->getElementsByTagName('html')->item(0));
?>

> array => (
> [0] => '<html>',
> [1] => '<head>',
> [2] => '<title>',
> [3] => 'Welcome',
> [4] => '</title>',
> [5] => '<link rel='stylesheet' type='text/css' href='default.css'>',
> [6] => '</head>',
> )
>
> How might I do this?

preg_split('/(<[^>]*>)/',$string,-1,PREG_SPLIT_DELIM_CAPTURE |
PREG_SPLIT_NO_EMPTY));
--
Rik Wasmus

law...@gmail.com

unread,
Dec 11, 2007, 12:23:51 PM12/11/07
to
On Dec 11, 10:31 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:

> On Tue, 11 Dec 2007 16:58:10 +0100, <lawp...@gmail.com> wrote:
>
> Aside from the code: various UA quircks rendering whitespace when they
> shouldn't (in lists for example) mean you could break a carefully laid out
> design like this.

What does UA stand for? Can you give an example of whitespace that
would cause this to break? I only plan on using this internally; I
don't plan on using whitespace to format.


>
> About the code, could this be to your liking?
>
> <?php
> $doc = new DOMDocument();
> $doc->formatOutput = true;
> $doc->loadHTML("<html><head><title>Welcome</title><link rel='stylesheet'
> type='text/css'
> href='default.css'></head><body><h1>foo</h1></body></html");
> echo $doc->saveXML($doc->getElementsByTagName('html')->item(0));
> ?>

That's it exactly! Thanks!


> > How might I do this?
>
> preg_split('/(<[^>]*>)/',$string,-1,PREG_SPLIT_DELIM_CAPTURE |
> PREG_SPLIT_NO_EMPTY));

Also wonderful! Thanks again!

Rik Wasmus

unread,
Dec 11, 2007, 12:50:19 PM12/11/07
to
On Tue, 11 Dec 2007 18:23:51 +0100, <law...@gmail.com> wrote:

> On Dec 11, 10:31 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Dec 2007 16:58:10 +0100, <lawp...@gmail.com> wrote:
>>
>> Aside from the code: various UA quircks rendering whitespace when they
>> shouldn't (in lists for example) mean you could break a carefully laid
>> out
>> design like this.
>
> What does UA stand for?

User-Agent, normally a browser, not neccessarily though.

> Can you give an example of whitespace that
> would cause this to break?

This one for instance:
http://www.hicksdesign.co.uk/journal/546/ie-whitespace-bug

> I only plan on using this internally; I
> don't plan on using whitespace to format.

As long as you don't use it for output, it shouldn't be a problem.

> That's it exactly! Thanks!

You're welcome.
--
Rik Wasmus

0 new messages