Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Efficiency questions: combined ifs and looping 4 times

0 views
Skip to first unread message

Csaba Gabor

unread,
Apr 14, 2009, 6:51:28 AM4/14/09
to
I have a PHP Script which takes approximately 12 hours
to run on my Win XP Pro machine, spending the vast
majority of its time in the function below, to compute some
combinatorial properties, so I would like to squeeze
every bit of efficiency from it.

Respondents should probably know why ++$var (pre
increment and decrement) is more efficient than $var++
(post increment and decrement), though I've read that
the interpreter does the conversion automatically in
cases where the variable is isolated.

I have three specific questions:
(1) The outer loop's only purpose is to loop 4 times.
What is the most efficient way to write it (I mean the
for statement)?

(2) Is it more efficient to combine the two if statements
and omit the continue?

(3) Does it make a difference in calling utilize whether I do:
$aUtilRes = utilize($newNums, $aUtilCopy=$aUtil);
or
$aUtilCopy = $aUtil;
$aUtilRes = utilize($newNums, $aUtilCopy);


function utilize($strBoxes, &$aRes = array()) {
// $aRes is an array of arrays, with the following property assumed:
// If $aRes[$idx] = array(...), then $idx = array_sum($aRes[$idx])
// Ie. $idx is the sum of the entries of the array it indexes
// And each subarray contains from 1 to 4 entries.
// $strBoxes is a space separated list of integers
// This function augments $aRes by adding zero,one,two,three,
// and four of each number in $strBoxes to the subarrays of
// $aRes, subject to each subarray being limited to four entries.
// In addition, if there are two ways to form a given sum, then
// the way with fewer addends is used.

$pos = strrpos(" $strBoxes", " ");
$box = 0 + substr($strBoxes,$pos); // Last num in $strBoxes
// Everything but the last number:
$strRest = substr($strBoxes,0,--$pos<0 ? 0 : $pos);

if ($strRest) utilize($strRest, $aRes);
for ($i=1;$i<=4;++$i) {
foreach ($aRes as $amt => $aBox) {
if (($size=sizeof($aBox))>=4) continue;
if (!$aRes[$next=$amt+$box] ||
sizeof($aRes[$next])>$size) {
$aRes[$next] = $aRes[$amt];
$aRes[$next][] = $box; } } }

$aRes[$prod=$box] = array($box);
$aRes[$prod+=$box] = array($box, $box);
if (!$aRes[$prod+=$box] || sizeof($aRes[$prod])>3)
$aRes[$prod] = array($box, $box, $box);
if (!$aRes[$prod+=$box])
$aRes[$prod] = array($box, $box, $box, $box);

return $aRes;
}


Note: $strBoxes is an ordered list of integers starting with 1
The function computes all possible sums of the numbers in
$strBoxes using no more than four numbers (a number may
be repeated in the sum). The function bogs down when
$strBoxes holds 7 integers.

Cacheing is already in effect via $aUtilCopy being passed in,
but the speedup is not great because the problem does not
decompose nicely so far as I can see. utilize() is itself
wrapped within a few loops, so it gets called (I think) millions
of times.

Thanks,
Csaba Gabor from Vienna

Jerry Stuckle

unread,
Apr 14, 2009, 7:01:23 AM4/14/09
to
Csaba Gabor wrote:
> I have a PHP Script which takes approximately 12 hours
> to run on my Win XP Pro machine, spending the vast
> majority of its time in the function below, to compute some
> combinatorial properties, so I would like to squeeze
> every bit of efficiency from it.
>
> Respondents should probably know why ++$var (pre
> increment and decrement) is more efficient than $var++
> (post increment and decrement), though I've read that
> the interpreter does the conversion automatically in
> cases where the variable is isolated.
>
> I have three specific questions:
> (1) The outer loop's only purpose is to loop 4 times.
> What is the most efficient way to write it (I mean the
> for statement)?
>

If it's only going to loop 4 times, it doesn't really matter. The
difference will be in microseconds.

> (2) Is it more efficient to combine the two if statements
> and omit the continue?
>

Perhaps, perhaps not. All you can do is try it.

> (3) Does it make a difference in calling utilize whether I do:
> $aUtilRes = utilize($newNums, $aUtilCopy=$aUtil);
> or
> $aUtilCopy = $aUtil;
> $aUtilRes = utilize($newNums, $aUtilCopy);
>

Same as above (although I wouldn't expect it to make much difference).

Personally, I wouldn't even try something like this in PHP. I'd go to a
compiled language such as C.

As far as your speed issue - I suspect the slow speed is the string
parsing of $strBoxes. I would think this would be much faster as an
array. However, what you really need to do is profile your script and
see where the performance bottleneck is.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

Iván Sánchez Ortega

unread,
Apr 14, 2009, 7:16:24 AM4/14/09
to
Csaba Gabor wrote:
> (1) The outer loop's only purpose is to loop 4 times.
> What is the most efficient way to write it (I mean the
> for statement)?

Efficiency does not depend on your coding style. It depends hugely on the
algorithm choosen.

> Note: $strBoxes is an ordered list of integers starting with 1
> The function computes all possible sums of the numbers in
> $strBoxes using no more than four numbers (a number may
> be repeated in the sum). The function bogs down when
> $strBoxes holds 7 integers.

Well, if you want to attack that problem from a brute-force perspective,
you'll have an algorithm of O(n^4) complexity, which is pretty bad.

> foreach ($aRes as $amt => $aBox) {
> if (($size=sizeof($aBox))>=4) continue;

And not only that, but you're calling sizeof(), which has a cost of O(n),
growing the complexity up to O(n^5). Ouch.

> if ($strRest) utilize($strRest, $aRes);

Oooohhh! Recursive calls! sizeof() and strpos() inside them! I won't bother
calculating the numbers, but I guess you're in the lines of O(n^10) now.

> Cacheing is already in effect via $aUtilCopy being passed in,

Stop worring about caching and saving memory by using one less variable.
That is premature optimization, and you shouldn't be doing it. Grab a pen
and a piece of paper and RETHINK YOUR ALGORITHM. See if you can use a tree
for storing the data (transversing it completely will have an horrendous
complexity, though).


Cheers,
--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

You will pioneer the first Martian colony.

Gordon

unread,
Apr 14, 2009, 6:44:18 PM4/14/09
to

The kinds of optimizations you're talking about here are
microoptimizations, they'll buy you microseconds if anything at all.

Choice of algorithm always has a far bigger impact in performance than
any one implementation detail, unless you inadvertently pick an
implementation detail that happens to be very expensive. Other than
using sizeof () within a loop there's nothing that obviously jumps
out. Use sizeof () before the loop, store the result in a var and use
the var inside your loop instead.

The simple fact is, however, that the problem you've chosen to solve
is a computationally expensive one, the algorithm you've chosen to use
to solve said problem is brute-force, and you're doing it in an
interperated language which means you have the interperater's overhead
to take account of as well.

There's probably a better solution to this problem out there, try
googling for it. One thing I'd suggest right off is using a balanced
binary tree instead of an array. Trees tend to be much faster than
sequential data structures for tasks like this, but implementing the
algorithm will be harder and require some thought.

And PHP is quite simply the wrong language for this job. PHP was
intended to spit out web pages, not do heavy duty mathematics. A
compiled language like C is bound to be faster simply because there's
no interperater overhead, at the very least you should be using Java
which is also going to be faster in this application than PHP.

Csaba Gabor

unread,
Apr 15, 2009, 7:29:46 AM4/15/09
to
On Apr 15, 12:44 am, Gordon <gordon.mc...@ntlworld.com> wrote:
> On Apr 14, 11:51 am, Csaba Gabor <dans...@gmail.com> wrote:
>
> > I have a PHP Script which takes approximately 12 hours
> > to run on my Win XP Pro machine, spending the vast
> > majority of its time in the function below, to compute some
> > combinatorial properties, so I would like to squeeze
> > every bit of efficiency from it.
>
> > Respondents should probably know why ++$var (pre
> > increment and decrement) is more efficient than $var++
> > (post increment and decrement)
...

> The kinds of optimizations you're talking about here are
> microoptimizations, they'll buy you microseconds if anything at all.
>
> Choice of algorithm always has a far bigger impact in performance than
> any one implementation detail, unless you inadvertently pick an
> implementation detail that happens to be very expensive. Other than
> using sizeof () within a loop there's nothing that obviously jumps
> out. Use sizeof () before the loop, store the result in a var and use
> the var inside your loop instead.
>
> The simple fact is, however, that the problem you've chosen to solve
> is a computationally expensive one, the algorithm you've chosen to use
> to solve said problem is brute-force, and you're doing it in an
> interperated language which means you have the interperater's overhead
> to take account of as well.

Thanks for your comments Iván and Gordon.
I agree with your comments Gordon, in particular that
the algorithm effects dominate, and any variations
in specific coding have not led to a measurable difference.

A note about sizeof(). sizeof()=count() is not
an inefficient function - the reason not to include it
in a loop is because of the overhead that it incurs as
a function call. Nevertheless, I recoded everything
so that the subarrays always toted around their size
(as their 0th element) and did not find any significant
time difference.

> There's probably a better solution to this problem out
> there, try googling for it.

Not likely, but I did try (though no doubt there are
faster implementations). In fact there is a reference
to what I was working on here:
http://www.research.att.com/~njas/sequences/A001214
Note that only 10 terms of the sequence are listed:
4, 10, 26, 44, 70, 108, 162, 228, 310, 422
the final term being added in 2006. That PHP only
bogs down at the 8th term (228) for me I thinks
speaks well of PHP.

It's clear that a compiled language will be faster
than an interpreted one, but PHP does pretty well
for itself, and development overhead is generally
far less.

Iván, your order of complexity analysis is off.
utilize() computes all possible 4-sums of a given
set, S, of n numbers. The maximum sum
can be 4*max(S). The algorithm works by first
taking one number, call it j=min(S), and
generating all 4-sums with that number (which
would be j, 2j, 3j, 4j). At each subsequent
step, it plops in the next number, k, and tries to
generate all 4-sums (from all the previous 4-sums)
which it would do by adding k, 2k, and 3k to each
of the existing sums. On any pass, this means at
most (3/4)max(S) entries to analyze. Since there
are n passes, this means that the complexity is
O(n*max(S)). The recursion is just the means to
the end and does not, of itself, represent
additional complexity. In addition, the
sizeof(...) does not affect the order of
complexity; at most it would be a constant factor.

The reason this bogs down is that it is getting
called repeatedly. In essence it's being called
with all sorts of variations of 8 numbers.

> One thing I'd suggest right off is using a balanced
> binary tree instead of an array. Trees tend to be
> much faster than sequential data structures for
> tasks like this, but implementing the
> algorithm will be harder and require some thought.

I don't see that this problem is amenable to the
form that you are suggesting. In order for it to
be a workable idea, there has to be a way to combine
the subtree results, and these types of problems
(integer sums) are notorious for their intrasigence.

> And PHP is quite simply the wrong language for this
> job. PHP was intended to spit out web pages, not do
> heavy duty mathematics. A compiled language like C
> is bound to be faster simply because there's no
> interperater overhead, at the very least you should
> be using Java which is also going to be faster in
> this application than PHP.

Yes, if I wanted to go further with this, then I'd
be obliged to switch. As it was, it was a diversion.
However, I did find a 25% speedup by realizing that
I was looping once too oft in the outer (now inner)
loop. I recoded it to be:

function utilize($strBoxes, $aRes = array()) {
// See thread start for doc:
// http://groups.google.com/group/comp.lang.php/browse_frm/thread/25d55fcda8bbdd5c/

$pos = strrpos(" $strBoxes", " ");

// Last number in $strBoxes,
// 0+ to avoid repeated coersion
$box = 0+substr($strBoxes,$pos);


// Everything but the last number:
$strRest = substr($strBoxes,0,--$pos<0 ? 0 : $pos);

if ($strRest) $aRes = utilize($strRest, $aRes);
$aRes[$box] = array($box);


foreach ($aRes as $amt => $aBox)

for ($i=0,$size = sizeof($aBox);++$i<=4-$size;) {
if (!$aRes[$amt+=$aBox[]=$box] ||
sizeof($aRes[$amt])>$size+$i)
$aRes[$amt] = $aBox;
else break; }

return $aRes; }

Iván Sánchez Ortega

unread,
Apr 16, 2009, 12:49:47 PM4/16/09
to
Csaba Gabor wrote:

> A note about sizeof(). sizeof()=count() is not
> an inefficient function

It *is*. It has a complexity of O(n) which adds to the overall complexity,
either you want it or not.

> I recoded everything so that the subarrays always toted
> around their size (as their 0th element) and did not
> find any significant time difference.

Remember that access to a hash table also has a cost.

> utilize() computes all possible 4-sums of a given
> set, S, of n numbers.

I don't understand what a "4-sum" is.

Are you trying to solve the postage stamp problem? The coin problem? Any
problem with a known name?

Can you provide a description of what is needed, some sample input and some
sample output?

> The reason this bogs down is that it is getting
> called repeatedly. In essence it's being called
> with all sorts of variations of 8 numbers.

Then let us look at the problem *as a whole*.


--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

Organización: nada funciona, pero todo el mundo sabe porque.

Csaba Gabor

unread,
Apr 18, 2009, 7:30:19 AM4/18/09
to
On Apr 16, 6:49 pm, Iván Sánchez Ortega <ivansanchez-...@rroba-

escomposlinux.-.punto.-.org> wrote:
> Csaba Gabor wrote:
> > A note about sizeof(). sizeof()=count() is not
> > an inefficient function
>
> It *is*. It has a complexity of O(n) which adds
> to the overall complexity, either you want it or not.

Can you substantiate that claim? I found the original
sizeof() remarks by you and Gordon interesting enough
that I did some testing to determine whether or not
sizeof must iterate through the entire array or not.

Following the tests outlined at http://php.net/count
I did a run through and noted the times. Then I
doubled the size of the array and ALL three of the
loops, including
for ($i=0;$i<sizeof($array);++$i)
took twice as long. That means that the time spent
in the sizeof() function is O(1) and not O(n). To
be specific, sizeof($array) seems to not have to
loop through the array to count the number of items
in it (If it were the case that PHP were cacheing
the size information, then one would also expect
more than a doubling in time).

Over all iterations, the time added from the entire
loop is O(n), but the same is also true if one
stores the size since a comparison must be done
in any case. Only the multiplicative constant is
different.

> > I recoded everything so that the subarrays always toted
> > around their size (as their 0th element) and did not
> > find any significant time difference.
>
> Remember that access to a hash table also has a cost.

Yes. And this cost is frequently significant.

> > utilize() computes all possible 4-sums of a given
> > set, S, of n numbers.
>
> I don't understand what a "4-sum" is.

By 4-sum I meant the sum of at most any 4 numbers
from S (with repeats allowed).

> Are you trying to solve the postage stamp problem? The coin problem? Any
> problem with a known name?

Yes, I was generating the values for the postage stamp
problem through size 8, with 4 stamps.

> Can you provide a description of what is needed, some sample input and some
> sample output?

My outer function takes one number, the size of
the postage stamp problem.
Ie.
$aBest = findBestStamps(8);
function findBestStamps($n) {
// finds a set, S, of $n positive integers
// such that the first hole for all
// 4-sums of S is maximal
// A hole is a number that has no 4-sum in S


> > The reason this bogs down is that it is getting
> > called repeatedly. In essence it's being called
> > with all sorts of variations of 8 numbers.
>
> Then let us look at the problem *as a whole*.

I had already solved the problem to my
satisfaction. But at the time I asked I was
interested to know whether I could gain any
significant speed advantage by more efficient
coding (staying within PHP).

0 new messages