I have written four simple PHP functions to help with "htmlarea"
fields:
html_tidy -- tidy up the HTML produced by the WYSIWYG editor
html_simplify -- remove complex formatting and convert tables to
simple text
html2text -- convert HTML to plain text
text2html -- convert plain text to HTML
For example, the htmldocs module defines the following htmlarea field:
<field name="data" displayname="Data" simple_type="htmlarea"
simple_size="15">
<filter views="display" function="truncate|500"/>
</field>
If an HTML document contains a table that is truncated, due to the
"truncate|500", then it upsets the layout of the Simple Groupware
page.
The new functions provide two ways to prevent this. The first method
uses html_simplify (which must be preceded by html_tidy):
<field name="data" displayname="Data" simple_type="htmlarea"
simple_size="15">
<filter views="display" function="html_tidy"/>
<filter views="display" function="html_simplify"/>
<filter views="display" function="truncate|500"/>
</field>
This leaves images and some HTML formatting intact. You can edit
html_simplify so that it removes or keeps just the types of formatting
you want.
The second method converts the HTML entirely to plain text (and back
again to preserve line breaks):
<field name="data" displayname="Data" simple_type="htmlarea"
simple_size="15">
<filter views="display" function="html2text"/>
<filter views="display" function="text2html"/>
<filter views="display" function="truncate|500"/>
</field>
This removes all HTML formatting, including images.
I give the code for these functions below (under the same license as
Simple Groupware). It should be added to <sgs-dir>/bin/core/classes/
modify.php. If you are using an older version of Simple Groupware then
the code should be added to <sgs-dir>/bin/core/functions_modify.php or
<sgs-dir>/bin/core/functions_user.php accordingly.
/* Tidy up the HTML produced by Simple Groupware's WYSIWYG editor */
function modify_html_tidy($var) {
$var = str_replace(chr(0),"",$var); // remove null-byte vulnerability
/* remove extraneous content */
$var = preg_replace("/(\n|\r|\t)/si"," ",$var); // remove linefeeds,
carriage returns and tabs (paste from OpenOffice)
$var = preg_replace("/ mce_bogus=\"1\">/si",">",$var);
$var = preg_replace("/> +</si","><",$var);
$var = preg_replace("/<p[^>]*>(<br[^>]*>)+/si","<p>",$var);
$var = preg_replace("/(<br[^>]*>)+(<\/(p|li|td|th)>)/si","$2",
$var); // (paste from OpenOffice)
$var = preg_replace("|<p[^>]*></p>|si","",$var);
$var = preg_replace("|(<br[^>]*>)+</|si","<br></",$var);
$var = preg_replace("/<br[^>]*>(<br[^>]*>)+/si","<br><br>",$var); //
("paste from Word" function)
$var = preg_replace("/ +/si"," ",$var);
/* pretty-print the HTML for human readers (optional) */
$var = preg_replace("/(<(div|table|\/?tbody|tr|p|h|ol|ul|li))/si","\n
$1",$var);
$var = preg_replace("/(<\/(div|table|ol|ul)>)/si","\n$1\n",$var);
$var = preg_replace("/(<\/(p|h)>)/si","$1\n",$var);
return $var;
}
/* Simplify HTML
Works on HTML produced by "modify_html_tidy" */
function modify_html_simplify($var) {
$var = preg_replace(array("|<head[^>]*?>.*?</head>|si","|
<script[^>]*?>.*?</script>|si","|<style[^>]*?>.*?</style>|si","|
<!--.*?-->|si"),"",$var);
$var = preg_replace(array("|<tr[^>]*?>|si","|<th[^>]*?>|si"),"<br>",
$var);
$var = preg_replace(array("|<p[^>]*>|si","|<h[1-6][^>]*>|si","|
<table[^>]*?>|si"),"<p>",$var);
$var = preg_replace(array("|</h[1-6][^>]*>|si","|</table[^>]*?>|
si"),"</p>",$var);
$var = preg_replace(array("|</?col[^>]*>|si","|</?span[^>]*>|si","|
</?div[^>]*>|si")," ",$var);
$var = preg_replace(array("|</?font[^>]*>|si","|</?i[^>]*>|si","|</?
b>|si","|</?strong[^>]*>|si")," ",$var);
$var = preg_replace(array("|</?tbody[^>]*?>|si","|</tr[^>]*?>|si","|
</th[^>]*?>|si","|</?td[^>]*?>|si")," ",$var);
$var = preg_replace("|p>\s*<br>|si","p>",$var);
return $var;
}
/* A simple HTML to text converter */
function html2text($var) {
$var = preg_replace("/(\n|\r)/si"," ",$var); // remove linefeeds and
carriage returns
$var = preg_replace("/(<[^ >]+)[^>]*/si","$1",$var); // remove style
and class info
$var = preg_replace("/(<br> *)+/si","<br>",$var); // remove unwanted
line breaks
$var = preg_replace("|<br> *</|si","</",$var);
$var = preg_replace("/(<(td|th|li)>) *<p>/si","$1",$var); // remove
unwanted paragraphs
$var = preg_replace("/<\/p> *(<\/(td|th|li)>)/si","$1",$var);
$var = preg_replace("/<li>/si","\n\t* ",$var);
$var = preg_replace("/<tr>/si","\n",$var);
$var = preg_replace("/<br>/si","\n\n",$var);
$var = preg_replace("/<\/?(h[1-9]|table|div|ol|ul|p)>/si","\n",$var);
$var = preg_replace("/<(th|td)>/si","\t",$var);
$var = preg_replace("/<[^>]+>/si","",$var); // remove all remaining
tags
$var = preg_replace(array("/ *\n */si","/ +\n/si","/\n +/si"),"\n",
$var); // remove unwanted blanks
$var = preg_replace("/\n\n+/si","\n\n",$var); // remove unwanted
newlines
$var = preg_replace("/^[\n ]+/si","",$var);
$var = preg_replace("/ /si"," ",$var); // special characters
$var = preg_replace("/&/si","&",$var);
$var = preg_replace("/>/si",">",$var);
$var = preg_replace("/</si","<",$var);
$var = preg_replace("/ +/si"," ",$var); // remove unwanted blanks
$var = preg_replace("/\t /si","\t",$var);
return $var;
}
/* A very simple text to HTML converter */
function text2html($var) {
$var = preg_replace("/\n+/s","\n",$var);
$var = preg_replace("/\n/s","<br>",$var);
$var = preg_replace("/\t/s"," ",$var);
return $var;
}
Correction - the function name declarations should, of course, read
function html_tidy($var) { ... }
function html_simplify($var) { ... }
function html2text($var) { ... }
function text2html($var) { ... }
I originally wrote the code for SG version 0.510 where the function
names start with "modify_"
Regards,
Paul.