New set class replacement

89 views
Skip to first unread message

mark_story

unread,
Feb 28, 2012, 5:45:32 PM2/28/12
to cakephp-core
I've spent sometime re-writing large parts of the Set class, and would
like to
get some feedback and ideas. The set class is an important part of
CakePHP and
it has some warts as its gained features over time. The re-write I'm
proposing
would live alongside the existing Set class until 3.0, or later. I've
temporarily named it Hash but it could be named anything really. Some
of the
problems I wanted to solve with the new methods are:

* Performance: Some of the methods in Set are not as performant as
they could
be. Removing recursive method calls improves the performance in
measurable
ways for certain methods.
* Remove the Xpath support. While I understand a number of people
like and
prefer the Xpath versions of extract(), they are problematic due to
a mismatch
between arrays and DOM structures. The xpath support is also
implemented
inconsistenly and not all methods support it. Lastly, the xpath
feature has a
number of existing open bugs that are simply not fixable without
breaking
other existing use cases.
* Unify the API. Some methods take the data array as the first
argument, others
as the second. Bringing some consistency to the methods makes usage
of the
class easier and requires less mental overhead.

To solve some of the performance issues, I made many of the methods
non-recursive. Methods like Set::countDim() are now up to 1.3x
faster. Also
more commonly used features like extract() are 1.6x faster.

Of course never believe any benchmarks you didn't fabricate yourself.
I've
pasted a bin of what I used to fabricate mine :)

https://gist.github.com/1935774

Downsides
---------------------------

In making extract() faster simpler and easier to manage a few features
were
removed. Hopefully they are not deal breakers, and I personally have
never used
any of them:

* Support for the `..` parent selector was removed.
* Support for the `[:first]` and `[:last]` attribute selectors was
removed.

I'm hoping that the few bad sides and the great number of upsides
makes this a
useful change for people.

The code is at https://github.com/markstory/cakephp/compare/2.1...set2

-Mark

Carlos Gant

unread,
Feb 28, 2012, 6:38:41 PM2/28/12
to cakeph...@googlegroups.com
I submitted a ticket sometime ago a class (Fset) for manage arrays by reference (like Set but by reference).

I think, the Set class can implement these methods because are small and fast (or simply adopt the concept or mix some parts, as you wish)

In my projects, i'm actually using much more times Fset class than Set. But I would prefer them in the same class (Set). And get rid from loading Fset file as Set is always loaded.

Here is the code (prepared for cakephp with tests): https://github.com/adael/fset

Apart, about the functionality you removed in the faster version of extact() function, perhaps if you split it into two functions, one, extract() that works as fast as possible, and another with full XPath support without care about speed, in example xpathExtract().

Hope it helps

Best Regards.
Excuse my english, is not my native language.


2012/2/28 mark_story <mark....@gmail.com>

Ceeram

unread,
Feb 29, 2012, 6:29:54 PM2/29/12
to cakeph...@googlegroups.com
Hi Mark,

Looks good to me, i noticed the testNumeric() tests Set::numeric() instead of Hash::numeric()

Greetings Ceeram

Op dinsdag 28 februari 2012 23:45:32 UTC+1 schreef mark_story het volgende:

mark_story

unread,
Mar 29, 2012, 9:21:32 PM3/29/12
to cakeph...@googlegroups.com
Hey Carlos,

First my apologies for taking so long to reply, your reply got lost in the deluge of mail I get.  As for Fset, what do you think are the key things missing in Hash, that are in Fset?  I think the basic operations like get, set, delete, are all present.  is_set() exists as check().  The only two omissions I see are is_empty() and count().  Both could easily be added to hash.

-Mark

mark_story

unread,
Mar 29, 2012, 9:33:33 PM3/29/12
to cakeph...@googlegroups.com
Just an update on this.  I think I'm done with the changes for Hash.  I've updated the core usage of Set to use Hash instead.


I've included all the recent updates to Set, and made the methods that are the same in both classes delegate to Hash.  Let me know about any comments or complaints with the new class, and I'll do my best to address them before merging into 2.2

-Mark

Thomas Ploch

unread,
Mar 30, 2012, 7:00:15 AM3/30/12
to cakeph...@googlegroups.com
I get a failing test for SetTest::testMerge().
http://travis-ci.org/#!/tPl0ch/cakephp/jobs/979557
<http://travis-ci.org/#%21/tPl0ch/cakephp/jobs/979557>

Greets,
Thomas

> https://gist.github.com/1935774 <https://gist.github.com/1935774>


>
> Downsides
> ---------------------------
>
> In making extract() faster simpler and easier to manage a few
> features
> were
> removed. Hopefully they are not deal breakers, and I
> personally have
> never used
> any of them:
>
> * Support for the `..` parent selector was removed.
> * Support for the `[:first]` and `[:last]` attribute selectors
> was
> removed.
>
> I'm hoping that the few bad sides and the great number of upsides
> makes this a
> useful change for people.
>
> The code is at
> https://github.com/markstory/cakephp/compare/2.1...set2

> <https://github.com/markstory/cakephp/compare/2.1...set2>
>
> -Mark
>


Carlos Gant

unread,
Mar 30, 2012, 2:19:13 PM3/30/12
to cakeph...@googlegroups.com
Hi Mark, if I understand well, you are telling me that functions in Fset are already in set class (ie: Fset::is_set() => Set::check), yes they are, but with one difference: Fset works with array by reference.

Take for example Fset::get():

static function get(array &$data, $path, $def = null);

Notice that first parameter $data is received byref. I've used Fset::get() in my medium-sized project without problem, and inside big loops worth it.

I made some test by myself and with big arrays, Fset::get is faster than Set::classicExtract()  (almost ten times faster).

But Fset::get has fewer features by far; for this, I suggest to integrate Fset into Set class in some way (or reuse code).

You tell in your first mail of this thread that one of your objectives is to gain performance, using arrays by reference can turn into a significant increase in performance.

It's difficult to me to explain in english (I'm spanish). Hope you get an idea of what I mean.
Regards.


2012/3/30 mark_story <mark....@gmail.com>

mark_story

unread,
Mar 31, 2012, 9:55:10 AM3/31/12
to cakeph...@googlegroups.com
I see that as well, I'll get it fixed.

-Mark

mark_story

unread,
Mar 31, 2012, 10:17:33 AM3/31/12
to cakeph...@googlegroups.com
Carlos,

While I haven't benchmarked Fset vs. Hash, I know that get(), and extract() in Hash are measurably faster than in Set. The pass by reference is an interesting idea that I'll look into.  I generally don't like pass by reference as it creates hidden side effects in the caller.  I can see how the pass by reference could save time spent on memory allocation and copying arrays.  For methods like remove()/insert() the pass by reference could introduce subtle issues that Set never had.

$one = array( ... );
$two = Hash::insert($one, 'Foo.bar', 'value');

Now both $one and $two are the same, and both contain the new value.  Unless you specifically knew that Hash used pass by ref, this could easily be thought of as a bug, and undesirable behaviour.  I don't think we should sacrifice usability and predictability of methods for performance unless its absolutely necessary.

I'll check to see what kind of benefit pass by ref has on methods like get/extract though.

-Mark

Carlos Gant

unread,
Mar 31, 2012, 12:14:41 PM3/31/12
to cakeph...@googlegroups.com
Mark, I completly agree with you, using values by reference can be dangerous. Anyway, notice that Fset::set() does not return value, and your previous sample with Fset would:

$one = array( ... );
$two = $one;
Fset::set($one, 'Foo.bar', 'value');

Now, $two is different than $one.

Apart of this, when I did the my suggestion, I mean to integrate Fset into Cakephp's Set class, not into Hash which I did not know (excuse me, in your before mails I misunderstood when you were referring to the class Hash).

The benchmarks I did, was with Set::classicExtract vs Fset::get(), not with Hash.

I have a look into your branch (here:  https://github.com/markstory/cakephp/blob/2.2-hash/lib/Cake/Utility/Hash.php), and I notice that indeed you are using $data by reference in Hash::get(), (on line 52) and I think is totally perfect.

In CakePHP 1.3, Hash class did not exists, for this I created the Fset class inspired by Cakephp Set class.

I'll use Hash class instead of Fset in advance (after upgrading my project) as is way better than mine. 

Great work Mark!.

Best Regards.


2012/3/31 mark_story <mark....@gmail.com>

mark_story

unread,
Mar 31, 2012, 9:33:29 PM3/31/12
to cakeph...@googlegroups.com
I still don't really like pass by ref, and functions without explicit returns.  They feel awkward, and unconventional to me.  I did some benchmarking using the same approach as in the original post in this thread.  Fset is a bit faster than Hash.  Originally, Fset has a bigger margin but I found a few places to optimize Hash.  Anyways, the results came out like:

Get
=========

Hash::get() took0.0331609249115 seconds
Fset::get() took 0.0254321098328 seconds

Set
=========
Hash::insert() took 0.0496759414673 seconds
Fset::set() took 0.0294997692108 seconds

Delete
=========
Hash::remove() took 0.038959980011 seconds
Fset::del() took 0.0281069278717 seconds

I think the additional feature set offered in Hash, and not using pass by ref make the speed difference acceptable :)

-Mark

Ceeram

unread,
Apr 4, 2012, 7:41:58 PM4/4/12
to cakeph...@googlegroups.com
@mark

SORT_NATURAL is php 5.4 only as i also commented on the commit

I also removed remaining Set::extract() calls from all core and test files:
https://github.com/ceeram/cakephp/compare/cakephp:2.2-hash...2.2-hash


Op dinsdag 28 februari 2012 23:45:32 UTC+1 schreef mark_story het volgende:
I've spent sometime re-writing large parts of the Set class, and would

Ceeram

unread,
Apr 4, 2012, 7:59:14 PM4/4/12
to cakeph...@googlegroups.com
Also i just noticed 2.2-hash branch has one fail on Postgres for find('neighbors')


Op dinsdag 28 februari 2012 23:45:32 UTC+1 schreef mark_story het volgende:
I've spent sometime re-writing large parts of the Set class, and would

mark_story

unread,
Apr 7, 2012, 2:24:12 PM4/7/12
to cakeph...@googlegroups.com
Thanks for the commits Ceeram, I'll merge them in.  I think calling current() without a variable issues an E_STRICT warning.  I updated the docs already about natural sorting only being available in PHP 5.4.  I'll make sure to check the postgres tests as well :)

-Mark

mark_story

unread,
Apr 7, 2012, 5:23:14 PM4/7/12
to cakeph...@googlegroups.com
Hey Marc,

I've fixed those issues, and things should be good now. :)

-Mark
Reply all
Reply to author
Forward
0 new messages