Compiling HTML Purifier on HipHop

198 views
Skip to first unread message

Edward Z. Yang

unread,
Mar 8, 2010, 1:41:57 PM3/8/10
to htmlpurifier
If you haven't heard, HipHop is Facebook's shiny PHP to C++
compiler. <http://github.com/facebook/hiphop-php>

I compiled it yesterday and played around with it a little today.
HTML Purifier doesn't quite seem to compile out of the box, the
error trace I got was:

[448, {
"BadPHPIncludeFile":
[{"c1":["HTMLPurifier/Bootstrap.php",0,0,40,50],"d":"require HTMLPURIFIER_PREFIX . '\/' . $file","s":0,}
,{"c1":["HTMLPurifier/LanguageFactory.php",0,0,156,30],"d":"include $filename","s":0,}
]
,"UseEvaluation":
[{"c1":["HTMLPurifier/VarParser/Native.php",0,0,17,40],"d":"eval('$var = ' . $expr . ';')","s":0,}
]
,"UnknownClass":
[{"c1":["HTMLPurifier/Generator.php",0,0,88,29],"d":"new tidy()","s":0,}
,{"c1":["HTMLPurifier/Lexer.php",0,0,121,57],"d":"new htmlpurifier_lexer_ph5p()","s":0,}
,{"c1":["HTMLPurifier/StringHash.php",0,0,20,40],"d":"arrayobject::offsetget($index)","s":0,}
]
,"UnknownBaseClass":
[{"c1":["HTMLPurifier/StringHash.php",0,0,11,29],"d":"arrayobject","s":0,}
]
,"UnknownFunction":
[{"c1":["HTMLPurifier/Bootstrap.php",0,0,65,47],"d":"spl_autoload_functions()","s":0,}
,{"c1":["HTMLPurifier/Bootstrap.php",0,0,66,44],"d":"spl_autoload_register($autoload)","s":0,}
,{"c1":["HTMLPurifier/PropertyListIterator.php",0,0,17,38],"d":"filteriterator::__construct($iterator)","s":0,}
]
}]

So it looks like HipHop doesn't have support for SPL, which is kind
of a shame. It'll be kind of interesting getting everything to compile,
including SimpleTest, which has the potential to make me HipHop very sad
due to its reflection trickery.

Cheers,
Edward

Manuel Vacelet

unread,
Mar 8, 2010, 2:39:45 PM3/8/10
to htmlpu...@googlegroups.com
On Mon, Mar 8, 2010 at 7:41 PM, Edward Z. Yang <ezy...@mit.edu> wrote:
> If you haven't heard, HipHop is Facebook's shiny PHP to C++
> compiler.  <http://github.com/facebook/hiphop-php>
>
> I compiled it yesterday and played around with it a little today.
> HTML Purifier doesn't quite seem to compile out of the box, the
> error trace I got was:
> ... snip ...

Hi Edward,

Do you expect a big performance gain ?
AFAIR, HTMLPurifier is mainly CPU and memory intensive so it should be
a good candidate for HipHop isn't it ?
Even for non-hiphop application, calling a HTMLPurifier/HipHop as a
webservice might still be interesting for large purification
operations.

Cheers,
Manuel

Edward Z. Yang

unread,
Mar 8, 2010, 3:04:19 PM3/8/10
to htmlpurifier
Excerpts from Manuel Vacelet's message of Mon Mar 08 14:39:45 -0500 2010:

> Do you expect a big performance gain ?

Yes! My dream for the longest time was to rewrite HTML Purifier in
C++, and now Facebook has gone and done most of the work for me. :-)

> Even for non-hiphop application, calling a HTMLPurifier/HipHop as a
> webservice might still be interesting for large purification
> operations.

Possibly. I haven't thought that far ahead yet. ;-)

Cheers,
Edward

Edward Z. Yang

unread,
Mar 8, 2010, 6:18:51 PM3/8/10
to htmlpurifier
Finished the build, took 16 minutes. Now time to see if I can run the
test suite...

Edward

dgm

unread,
Mar 8, 2010, 7:26:39 PM3/8/10
to htmlpurifier

On Mar 8, 2:04 pm, "Edward Z. Yang" <ezy...@MIT.EDU> wrote:

> Yes!  My dream for the longest time was to rewrite HTML Purifier in
> C++, and now Facebook has gone and done most of the work for me. :-)


I think that would be awesome, as I think bindings could be made for
other languages. I'd like to convert a project from php to ruby on
rails, but I can't do it without htmlpurifier!

Edward Z. Yang

unread,
Mar 9, 2010, 1:38:34 AM3/9/10
to htmlpurifier
Bah, I get:

ezyang@ezyang:~/Dev/htmlpurifier/hphp-out$ ./program --file library/HTMLPurifier.stub.php
foo
bar
Destructor threw an exception: 1125137:11252a7:112fc2a:bf15f2:cb5c9e:d95536:d951bd:dc7394:dc4882:d94fd7:dbaf81:b3e97d:a27040:a2ac1d:b68b68:b336ea:b36372:b68b68:b29f5a:b4e319:bd6f8c:bd82ed:bd9822:7fe79b935abd:7949b9 t___destruct is not implemented yet.
Unhandled error: t___construct is not implemented yet.

If anyone wants to take a look at this, I've published a hiphop branch
with the necessary changes to make this compile. I'm probably not going
to look too much more closely at this.

http://repo.or.cz/w/htmlpurifier.git/shortlog/refs/heads/hiphop

Compile using compile.sh

Cheers,
Edward

Edward Z. Yang

unread,
Mar 9, 2010, 2:09:26 AM3/9/10
to htmlpurifier
Looks like I lied; switching to DirectLex and changing argument parsing
made things work. Using the Wikipedia homepage as input:

ezyang@ezyang:~/Dev/htmlpurifier$ time cat Main_Page | hphp-out/program -f library/HTMLPurifier.stub.php > /dev/null
real 0m0.237s
user 0m0.210s
sys 0m0.020s

ezyang@ezyang:~/Dev/htmlpurifier$ time cat Main_Page | php library/HTMLPurifier.stub.php > /dev/null
real 0m0.550s
user 0m0.510s
sys 0m0.040s

Over half reduction in runtime; not bad! I haven't checked change in memory
usage.

Cheers,
Edward

Sanjay

unread,
Oct 8, 2012, 7:35:39 AM10/8/12
to htmlpu...@googlegroups.com
The last post was over a year ago and hiphop has been updated since then. Is HipHop still the best way to get a C++ port of HTMLPurifier ? Does the current version of HipHop do a better job of porting HTMLPurifier to C++ ? I want to compile the resultant C++ code in Windows.

Edward Z. Yang

unread,
Oct 10, 2012, 2:14:37 PM10/10/12
to Sanjay, htmlpurifier
Hey Sanjay,

The patches need to be updated for the latest version and tested, but I
don't think there should be any serious problems doing this.

Edward

Excerpts from Sanjay's message of Mon Oct 08 04:35:39 -0700 2012:

lks...@googlemail.com

unread,
Jan 10, 2014, 8:42:33 AM1/10/14
to htmlpu...@googlegroups.com, Sanjay
Hey Edward, Sanjay,

did you guys come any further on porting htmlpurifier to c++? 

We are currently working on an email-app and need to sanitize incoming html emails. But since there is no good solution for objective-c (resp. c++), we are thinking of writing our own sanitizer based on the parser libxml2.

There are two wrapper for obj-c (TFhpple and Objective-C-HMTL-Parser), but they focus strongly on getting stuff out of the xml, instead of changing it.

So if htmlparser and hiphop does not work, we'll have to write our own "little" sanitizer. If we'd had more time / muscle, we would love to port htmlpurifier to c++.

Best,

Luke

Manuel Vacelet

unread,
Jan 10, 2014, 2:01:57 PM1/10/14
to htmlpu...@googlegroups.com, Sanjay
Hi,

FWIW, since then HHVM project changed to be a VM (with JIT) instead of complier.

So it's no longer a candiate for a traduction PHP >> C++
(except if you keep the  idea of webservice...)

Manuel


--
You received this message because you are subscribed to the Google Groups "htmlpurifier" group.
To unsubscribe from this group and stop receiving emails from it, send an email to htmlpurifier...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Twitter: @vaceletm

Lukas Neumann

unread,
Jan 10, 2014, 2:34:14 PM1/10/14
to htmlpu...@googlegroups.com
Hi Manuel,

ok thats what I feared. We'll try now to implement something in objective-c directly.

Luke
You received this message because you are subscribed to a topic in the Google Groups "htmlpurifier" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/htmlpurifier/Zhql5lYW5dw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to htmlpurifier...@googlegroups.com.

Sanjay

unread,
Jan 12, 2014, 11:58:56 PM1/12/14
to htmlpu...@googlegroups.com, Sanjay
Hey Luke,

We didn't port htmlpurifier to C++. we were using it as part of another PHP based engine which we ported to C++. But then we started using another engine altogether which didn't require htmlpurifier.

Thanks,
Sanjay

Lukas Neumann

unread,
Jan 13, 2014, 3:43:49 AM1/13/14
to htmlpu...@googlegroups.com, Sanjay
Hi Sanjay,

ah ok. What engine was that ? We are porting htmlPurifier to an objective-c Framework. 
I will post the link to our github-project asap.

Luke

Sanjay Awatramani

unread,
Jan 13, 2014, 3:52:01 AM1/13/14
to Lukas Neumann, htmlpu...@googlegroups.com
That was PHP-IDS for a security project. It depends on HTMLPurifier. We ported PHPIDS to C++, but couldn't port HTMLPurifier.


> > ezyang@ezyang:~/Dev/ htmlpurifier$ time cat Main_Page | hphp-out/program -f
> > library/HTMLPurifier.stub.php > /dev/null
> > real    0m0.237s
> > user    0m0.210s
> > sys     0m0.020s
> >
> > ezyang@ezyang:~/Dev/ htmlpurifier$ time cat Main_Page | php
> > library/HTMLPurifier.stub.php > /dev/null
> > real    0m0.550s
> > user    0m0.510s
> > sys     0m0.040s
> >
> > Over half reduction in runtime; not bad!  I haven't checked change in
> > memory
> > usage.
> >
> > Cheers,
> > Edward
> >
> >
Reply all
Reply to author
Forward
0 new messages