lxml & windows - post.excerpts [blogofile

Péter Zsoldos

unread,

Oct 29, 2012, 6:44:04 PM10/29/12

to blogofil...@googlegroups.com

Hi,

finally got around to moving a site to blogofile, and I have ran into some problems. Since I'm not sure how best to communicate the issues I find, I'm sending this email here, and I've also opened a ticket for another issue I've run into (https://github.com/EnigmaCurry/blogofile_blog/issues/18). Let me know which forum you prefer.

I wanted to use the excerpts functionality on windows (7, pytohn 2.7.1x86). Unfortunately it requires lxml, which is notoriously unsupported on windows (http://lxml.de/FAQ.html#where-are-the-binary-builds). Grepping the 0.8b1 blogofile (and blogofile_blog) source code suggests lxml is only ever used in one place (blogofile_blog.site_src._controlles.blog (in the __excerpt method), and it has a rather simple usage (strip html tags)

As an experiment, I've changed to using BeautifulSoup, which installs just fine on windows via pip (pseudo-diff below), and it works just fine for this purpose.

I don't know what was the reason behind choosing lxml, so if there is a reason to keep that as a default implementation, I don't have an issue with that, nor do I have any strong feelings for BS; but I would like to have an easily installable post excerpts dependency on windows. Let me know what you think!

Thanks,

Peter

P.S.: and the pseudo diff (updating the error message not included):

# import lxml.html

import bs4

....

# post_text = lxml.html.fromstring(self.content).text_content()

soup = bs4.BeautifulSoup(self.content)

post_text = soup.text

Doug Latornell

unread,

Oct 29, 2012, 7:27:09 PM10/29/12

to blogofil...@googlegroups.com

On Mon, Oct 29, 2012 at 3:44 PM, Péter Zsoldos <peter....@gmail.com> wrote:

Hi,

finally got around to moving a site to blogofile, and I have ran into some problems.

Thanks for trying blogofile, and especially thanks for taking the time to give constructive feedback on the issues you found!

Since I'm not sure how best to communicate the issues I find, I'm sending this email here, and I've also opened a ticket for another issue I've run into (https://github.com/EnigmaCurry/blogofile_blog/issues/18). Let me know which forum you prefer.

Either forum works - I see them both in email. I think you've made the right choices on these. Issue 18 looks like a pretty clear bug, so the issue tracker is a great place for that. lxml vs BeautifulSoup is more question-ish, and the forum/mail-list is ideal for that.

I wanted to use the excerpts functionality on windows (7, pytohn 2.7.1x86). Unfortunately it requires lxml, which is notoriously unsupported on windows (http://lxml.de/FAQ.html#where-are-the-binary-builds). Grepping the 0.8b1 blogofile (and blogofile_blog) source code suggests lxml is only ever used in one place (blogofile_blog.site_src._controlles.blog (in the __excerpt method), and it has a rather simple usage (strip html tags)

As an experiment, I've changed to using BeautifulSoup, which installs just fine on windows via pip (pseudo-diff below), and it works just fine for this purpose.

I don't know what was the reason behind choosing lxml, so if there is a reason to keep that as a default implementation, I don't have an issue with that, nor do I have any strong feelings for BS; but I would like to have an easily installable post excerpts dependency on windows. Let me know what you think!

I know I've seen some narrative about lxml vs. BeautifulSoup in Blogofile, but can't recall whether it was in the issue tracker, on this list, or in the commit message. Anyway, IIRC, the issue was that BeautifulSoup didn't work under Python 3. I believe that issue has been resolved, and I think BeautifulSoup is the way to go. Not only is lxml unsupported on Windows, the last time I tried to install it on OS/X it was anything but easy. I think this is a case where the speed of lxml can be sacrificed for the cross-platform availabiltity of BeautifulSoup. Of course, I'd also welcome a pull request that allowed either to be used.

And, of course, if anyone on the list can shed more light on the issue, I'd welcome that too!

Changing to BeautifulSoup for excerpts is on my mental todo list, but I hadn't realized that it was quite as simple as your diff below indicates. A pull request would be the fastest way for you to make this happen!

Thanks,

Peter

P.S.: and the pseudo diff (updating the error message not included):
# import lxml.html
import bs4

....
# post_text = lxml.html.fromstring(self.content).text_content()
soup = bs4.BeautifulSoup(self.content)
post_text = soup.text

--
You received this message because you are subscribed to the Google Groups "blogofile-discuss" group.
To post to this group, send email to blogofil...@googlegroups.com.
To unsubscribe from this group, send email to blogofile-disc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blogofile-discuss?hl=en.

Kevin Horn

unread,

Oct 30, 2012, 9:49:38 AM10/30/12

to blogofil...@googlegroups.com

On Mon, Oct 29, 2012 at 6:27 PM, Doug Latornell <d...@douglatornell.ca> wrote:

On Mon, Oct 29, 2012 at 3:44 PM, Péter Zsoldos <peter....@gmail.com> wrote:
Hi,

finally got around to moving a site to blogofile, and I have ran into some problems.

Thanks for trying blogofile, and especially thanks for taking the time to give constructive feedback on the issues you found!

Since I'm not sure how best to communicate the issues I find, I'm sending this email here, and I've also opened a ticket for another issue I've run into (https://github.com/EnigmaCurry/blogofile_blog/issues/18). Let me know which forum you prefer.

Either forum works - I see them both in email. I think you've made the right choices on these. Issue 18 looks like a pretty clear bug, so the issue tracker is a great place for that. lxml vs BeautifulSoup is more question-ish, and the forum/mail-list is ideal for that.

I wanted to use the excerpts functionality on windows (7, pytohn 2.7.1x86). Unfortunately it requires lxml, which is notoriously unsupported on windows (http://lxml.de/FAQ.html#where-are-the-binary-builds). Grepping the 0.8b1 blogofile (and blogofile_blog) source code suggests lxml is only ever used in one place (blogofile_blog.site_src._controlles.blog (in the __excerpt method), and it has a rather simple usage (strip html tags)

As an experiment, I've changed to using BeautifulSoup, which installs just fine on windows via pip (pseudo-diff below), and it works just fine for this purpose.

I don't know what was the reason behind choosing lxml, so if there is a reason to keep that as a default implementation, I don't have an issue with that, nor do I have any strong feelings for BS; but I would like to have an easily installable post excerpts dependency on windows. Let me know what you think!

I know I've seen some narrative about lxml vs. BeautifulSoup in Blogofile, but can't recall whether it was in the issue tracker, on this list, or in the commit message. Anyway, IIRC, the issue was that BeautifulSoup didn't work under Python 3. I believe that issue has been resolved, and I think BeautifulSoup is the way to go. Not only is lxml unsupported on Windows, the last time I tried to install it on OS/X it was anything but easy. I think this is a case where the speed of lxml can be sacrificed for the cross-platform availabiltity of BeautifulSoup. Of course, I'd also welcome a pull request that allowed either to be used.

And, of course, if anyone on the list can shed more light on the issue, I'd welcome that too!

Changing to BeautifulSoup for excerpts is on my mental todo list, but I hadn't realized that it was quite as simple as your diff below indicates. A pull request would be the fastest way for you to make this happen!

I find lxml to be much easier to use than BeautifulSoup (and more powerful, since it has a lot more than just the BeautifulSoup parser...I often need the elementtree stuff as well), and I use it just fine on Windows all the time.

Here are some options for installing it:

1) I usually have mingw installed, so I usually use that to compile it from source. Unless somethings changed very recently, this should work fine. Granted many Windows users won't have that option, so:

2) You can use the unofficial builds referenced in the FAQ

3) Or you can install lxml==2.3, which has windows builds on PyPI, so you can use easy_install to install it (not pip, which doesn't support binary packages). I realize this seems pretty old, but it isn't really that far back, and I've never had problems with it.

I've had _far_ more difficulty installing lxml on OSX than I've ever had installing it on Windows.

I think the best option would be to allow either, as Doug suggested. I wouldn't expect this to be too difficult, as the APIs are (at least mostly) the same. You'd just need to do some kind of conditional import dance and import whichever you find under some alias ("from x import as y").

--

Kevin Horn

Doug Latornell

unread,

Nov 22, 2012, 5:58:18 PM11/22/12

to blogofil...@googlegroups.com

Added https://github.com/EnigmaCurry/blogofile_blog/issues/19 to track this.

Reply all

Reply to author

Forward

lxml & windows - post.excerpts [blogofile_blog 0.8b1]

Péter Zsoldos

Doug Latornell

Kevin Horn

Doug Latornell