Joomla 1.5 System SEF onAfterRender Issue for Large Buffers

445 views
Skip to first unread message

orware

unread,
Oct 10, 2010, 12:49:31 PM10/10/10
to Joomla! CMS Development
Reader's Note: The first part of this explains how I originally
thought this was a query optimization issue, but later on I explain
how it became a problem with the System SEF plugin.

There's a Content Plugin extension I've been developing for the past
few months and released as a commercial GPL extension that allows the
DOCman Category Hierarchy and File Lists to be displayed within your
pages using a simple syntax:
e.g. {docmanlist get_subfolders 1 37} (translates to "get subfolders
up to 1 level deep for category 37")

Since the beginning I realized that by using DOCman's Internal Methods
I was causing the query count for the page to jump dramatically (in
the 100-200 range), but I didn't have enough documents in my own setup
to justify looking into optimizing that area yet until this week.

So a new customer bought the plugin for their site, which has a couple
of thousand documents. However, when he tried to display a particular
one with its subcategories 3 levels deep the page would spend some
time loading and then give up with a White Screen of Death.

My thought of course was that this was entirely due to the number of
queries that were executing on that page, and once I was provided with
a local copy of the customer's site to test with I did indeed find out
that over 2000 queries were being executed so I went to work
optimizing the data retrieval to minimize the number of queries to a
relatively constant number that's around 5 queries.

I thought then that the plugin would have no trouble loading, but it
appeared to have the same issue, even after the queries had been
optimized.

So I played around with the layout code (which causes a table to be
output) and kept on removing bits until it would finally load. It
didn't seem to matter which part of the layout was removed...I just
felt like the buffer for the page had a certain size threshold and
with the output for that particular page it just choked on it.

As I was writing this post I was going to leave it at that, but then I
thought I could investigate the problem a little bit more so I did. As
I traced the the plugin's output throughout the Joomla system, I
noticed it was completely fine until it got towards the end of
index.php and triggered the onAfterRender event:
// trigger the onAfterRender events
JDEBUG ? $_PROFILER->mark('afterRender') : null;
$mainframe->triggerEvent('onAfterRender');

That's where I discovered that the SEF System Plugin is what was
changing the buffer into an empty string (causing the White Screen of
Death syndrome I had been experiencing).

Tracing through the SEF System Plugin's onAfterRender() method, it
seems that the culprit belongs to the following regular expression
replacement:
// Background image
$regex = '#style\s*=\s*[\'\"](.*):\s*url\s*\([\'\"]?(?!/|'.
$protocols.'|\#)([^\)\'\"]+)[\'\"]?\)#m';
$buffer = preg_replace($regex, 'style="$1: url(\''. $base .'$2$3\')',
$buffer);

After the above lines get executed $buffer becomes NULL, which
according to the preg_replace documentation, indicates that an error
occurred.

Making use of the preg_last_error() function, I'm getting a value of
two which appears to indicate the following PREG error:
PREG_BACKTRACK_LIMIT_ERROR Returned by preg_last_error() if
backtrack limit was exhausted. Available since PHP 5.2.0.

The PREG error seems to be in line with the fact that the White Screen
of Death only crops up when there is a lot of content to be displayed
on the page (and doesn't occur when there is considerably less
output).

I was hoping somebody might be able to offer some suggestions so that
perhaps the above regex could be optimized (and/or removed if it's
unnecessary) as it is the only one that appears to be causing issues
with the display output. Either that, or we can rework the way the
buffer gets replaced so that in the case that an error does occur, the
null value doesn't replace the entire buffer the way it is currently
doing.

Since the same regular expressions appear to be being used in the
Joomla 1.6 SEF System Plugin, this should be beneficial across both
systems.

Thierry bela nanga

unread,
Oct 10, 2010, 1:12:35 PM10/10/10
to joomla-...@googlegroups.com
you can set the backtrack limit to a value higher enough to solve your issue

<?php
ini_set("pcre.backtrack_limit", 1000000);
?>


--
You received this message because you are subscribed to the Google Groups "Joomla! CMS Development" group.
To post to this group, send an email to joomla-...@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-cm...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/joomla-dev-cms?hl=en-GB.




--
http://tbela99.blogspot.com/

fax : (+33) 08 26 51 94 51

Omar Ramos

unread,
Oct 10, 2010, 2:08:10 PM10/10/10
to joomla-...@googlegroups.com
Thanks Thierry,

That's definitely an option, but I'd like to suggest something that gets included in the plugin by default, rather than requiring users to go out and manually make changes.

So using something along the lines of your suggestion, and what else I read it would probably be better to try something sort of like this for that particular regular expression:
// Background image
$regex = '#style\s*=\s*[\'\"](.*):\s*url\s*\([\'\"]?(?!/|'.$protocols.'|\#)([^\)\'\"]+)[\'\"]?\)#m';
$newBuffer = preg_replace($regex, 'style="$1: url(\''. $base .'$2$3\')', $buffer);
if (is_null($newBuffer)) {
@ini_set('pcre.recursion_limit', 20000000);
@ini_set('pcre.backtrack_limit', 10000000);
$newBuffer = preg_replace($regex, 'style="$1: url(\''. $base .'$2$3\')', $buffer);
if (!is_null($newBuffer)) {
$buffer = $newBuffer;
}
} else {
$buffer = $newBuffer;
}

That way the original $buffer never gets overridden with a null value, I'm just not sure if similar code to the above is needed for the other regular expressions used in the SEF System Plugin or not.

-Omar

Omar Ramos

unread,
Mar 24, 2013, 10:06:44 PM3/24/13
to joomla-...@googlegroups.com
So I'm going ahead and reviving an old thread here since this is an old issue for me that I've been working around for a while, but I spent some time today trying to understand what the cause was exactly and wanted to share with others what I found out about what the true issue actually is.

As mentioned in my previous reply below, I ended up modifying the default Joomla System SEF Plugin to increase the PCRE Recursion/Backtrack Limits on the fly if possible, in order to avoid the "White Screen of Death" issue that would be caused when the plugin was enabled back when I originally encountered the issue with Joomla 1.5 when a lot of data was being output on the screen, particularly when using the content plugin I had developed, since it generated a lot of HTML to be displayed on the page.

Newer versions of Joomla improved the situation slightly, since at least now you get an error screen due to the checkBuffer() method now in the System SEF Plugin:
    private function checkBuffer($buffer) {
        if ($buffer === null) {
            switch (preg_last_error()) {
            case PREG_BACKTRACK_LIMIT_ERROR:
                $message = "PHP regular expression limit reached (pcre.backtrack_limit)";
                break;
            case PREG_RECURSION_LIMIT_ERROR:
                $message = "PHP regular expression limit reached (pcre.recursion_limit)";
                break;
            case PREG_BAD_UTF8_ERROR:
                $message = "Bad UTF8 passed to PCRE function";
                break;
            default:
                $message = "Unknown PCRE error calling PCRE function";
            }
            JError::raiseError(500, $message);
        }
    }

But unfortunately, that approach doesn't really help fix the problem (for example, that doesn't help anyone trying to use my plugin because they would just get the 500 error or Exception instead of the desired output) and in my case the buffer only got messed up when the background image RegEx was run within the System SEF Plugin so I ended up creating a modified version of the System SEF Plugin with my solution and pointed my users to it if they ended up encountering the issue on their site.

I don't know if I ever created a tracker item for the issue, but I did find this existing one which was closed recently and surprisingly mentioned that the patches had been committed and were part of Joomla 3.1:

However, I examined the patches and took a look at the latest Joomla 3.1 beta2, but I don't see anything much different in the System SEF Plugin that would address the situation.

Regardless, the issue came up again today when I was about to start migrating some new changes I had made to an older version of my plugin and I decided I'd take a closer look and try and figure out what exactly caused that Background Image RegEx to die on me with the output my plugin was generating.

I tried all sorts of different things to try and figure it out, and was a bit perplexed when the following call to my plugin:
{docmanlist get_subfolders 99 1}

Would die and yet the following calls:

{docmanlist get_subfolders 99 8}
{docmanlist get_subfolders 99 7}
{docmanlist get_subfolders 99 6}
{docmanlist get_subfolders 99 5}
{docmanlist get_subfolders 99 4}
{docmanlist get_subfolders 99 3}
{docmanlist get_subfolders 99 2}

Which generated almost exactly the same output, would not.

Then I tried running my output through HTML Tidy and throwing that into a test article and trying to see if I could replicate the issue that way and found I couldn't, which is pretty similar to what Elin tried during her test in the tracker item above.

I tried a few other things too, but then I decided to go ahead and create a test file with the page output as it was right before the call to the Background Image RegEx and try and see if I could create a test case where I could replicate the issue quickly to help me figure out what the issue was.

I was successfully able to replicate the issue in this way, and then I began thinking about what the difference between the two approaches would be in terms of output and what I realized was that in the first case the entire output was being added to the page as a single, very long, string, while in the second case each call to the plugin would result in separate strings of a shorter length being added to the page. So I created a third test file and started breaking the single long line down into a shorter length and found I was able to get it process correctly right around the 104,600 characters per line mark.

So really, the issue actually is that the output I'm generating is too long to be processed correctly if the PCRE Backtrack Limit happens to be set at a lower value than the length of the string that gets generated by my plugin.

It looks like the quick fix here would be to modify how my output is generated so there are some line breaks in there, which will be easy enough now that I know what causes the issue, but generating a long output string is something that other developers could easily make the same mistake doing too so hopefully this helps others to know about.

The other side of this story, that I've also tried to clear up within my output, but which I don't have great concrete data on yet is that the more inline styles my output had, the longer the System SEF Plugin took to process the Background Image RegEx. Also, the first RegEx in the System SEF plugin, which seems to add in the base URL for your site to any src, href, and poster attributes in your output, seems to take longer to process if the more links are displayed on the page so I think I also tried to minimize that impact by including the base URL already within my output and I think allowed the RegEx below to get processed more quickly (I was doing that development/testing two weeks ago, so the memory is a little fuzzy on those two, but I was noticing multiple seconds being adding to the pageload due to the time it was taking for the output to process these two regular expressions).
$regex = '#(src|href|poster)="(?!/|'.$protocols.'|\#|\')([^"]*)"#m';
$buffer = preg_replace($regex, "$1=\"$base\$2\"", $buffer);

I've gone ahead and attached my sample test files, but here is the output that was generated on my end (if I shortened the longest line in the 3rd test file slightly, then the buffer would remain intact):
Test File 1: C:\xampp\htdocs\tests\background_image_regex/docmanlist_get_subfolders_99_0_output.txt
Test File 2: C:\xampp\htdocs\tests\background_image_regex/docmanlist_get_subfolders_99_individual_output.txt
Test File 3: C:\xampp\htdocs\tests\background_image_regex/docmanlist_get_subfolders_99_0_output_without_inline_styles.txt

Buffer is now null (processed preg_replace() in 0.036 seconds)
Buffer is intact (processed preg_replace() in 0.072 seconds)
Buffer is now null (processed preg_replace() in 0.093 seconds)


Test File 1 had total of 402 lines.
Longest line found on line 208 with 284911 characters.


Test File 2 had total of 408 lines.
Longest line found on line 209 with 49425 characters.


Test File 3 had total of 406 lines.
Longest line found on line 211 with 104619 characters.
background_image_regex.zip

Omar Ramos

unread,
Mar 24, 2013, 10:12:36 PM3/24/13
to joomla-...@googlegroups.com
Also, I forgot to mention:

The PCRE Backtrack limitation seems to be less of an issue in PHP 5.3.7+ because the default limit has been raised to 1 million:

However, since there are a still a quite a few webhosts probably on PHP 5.2.x (and on computer I'm running an older version of XAMPP which has PHP 5.3.1) there are probably still a number of people that would run into the issue under normal situations.
Reply all
Reply to author
Forward
0 new messages