Package proposal: PHP Html Parser

40 views
Skip to first unread message

gilles paquette

unread,
Apr 6, 2016, 12:50:46 PM4/6/16
to thephpleague
Hi everyone,

I wrote a package to parse html in php and allow you to select html tags using a css/jquery like selector. 


The package adheres to all 10 of the league quality points. The original package was developed by sunra but the package was not UTF-8 compliant and was not being updated so I forked it from that point into what it is today. I original forked this package 2 years ago and have been maintaining and improving it since. 

- Changes the encording of all text to what it gets from the HTTP headers, can be manually set.
- Currently sitting at 96% coverage with coveralls
- Scrutinizer score is setting at about 7.6

I just release version 1.7.0 with a bunch of new features requested and am looking to make version 2.0.0 which allows me to take the freedom of changing things around and fixing some of the issues that would help increase the scrutinizer score and increase the requirement from php 5.4 to 5.6. Would this group be interested in a package like this? Any feedback is appreciated.

Thanks!

Woody Gilk

unread,
Apr 6, 2016, 4:07:15 PM4/6/16
to gilles paquette, thephpleague
What makes this different/better than the large number of other HTML parsers that already exist? https://packagist.org/search/?q=html+parser

What other popular packages exist in this category?

--
You received this message because you are subscribed to the Google Groups "thephpleague" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thephpleague...@googlegroups.com.
To post to this group, send email to thephp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/thephpleague/6aadd7c8-127a-4b02-894f-8f51a207e3ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gilles paquette

unread,
Apr 6, 2016, 4:27:53 PM4/6/16
to thephpleague, paqu...@gmail.com
hey,

The other popular package is primarly the one I mentioned in the post, sunra/php-simple-html-dom-parser. The primary improvements on this package are

- UTF-8 compatbiel (and other encodings).
- Tests.
- Actually uses autoloading/project is not all in just one file.
- Documentation.
- The dom is easier to manipulate and traverse instead of everything being in an array/plane text.
- A bunch of stuff that are broken in the sunra version are fixed and stable in my version.
- I actively maintain it, which seems to not be the case for all the other packages which have just a single version on packagist.

Other packages in that list include

https://packagist.org/packages/brjpeters/php-html-parser - Which is an outdated fork of my project which supports version 5.3 of php while mine does not.

https://packagist.org/packages/monashee/php-simple-html-dom-parser - Is a fork of sunra and contains all the problems that sunra has, including single file that contains all the code.

https://packagist.org/packages/olamedia/nokogiri - Don't know what is going on here but it seems to have the same issues that sunra has but does not seem to be a fork of any of that code.

https://packagist.org/packages/bupt1987/html-parser - Same issue, single file, no autoloading. This project has tests (a single test) so that is an improvement. 

Thanks.

Colin O'Dell

unread,
Apr 6, 2016, 4:32:45 PM4/6/16
to gilles paquette, thephpleague
How does this compare to Symfony's DomCrawler component? (https://symfony.com/doc/current/components/dom_crawler.html)

Reply all
Reply to author
Forward
0 new messages