How can I exclude "code debris"?

23 views
Skip to first unread message

Guido Benigni

unread,
Oct 28, 2016, 7:11:56 AM10/28/16
to beautifulsoup
Hi all,
first of all, I'm NOT a programmer, alas…

This said, I need to scrap tomorrow's horoscope for every zodiac sign, to put it in a newsletter I'm working daily.
I tried to approach the problem with Beautiful Soup.
This is what I achieved:

import requests
from bs4 import BeautifulSoup
url
= "http://oroscopo.sky.it/oroscopo/domani/ariete.html"
r
= requests.get(url)
soup
= BeautifulSoup(r.content, 'html.parser')
hor
= soup.find_all("p", {"class": "box_description_oroscopo"})
print hor


And this is what it returns:

[<p class="box_description_oroscopo">Oroscopo di domani <strong>04 ottobre 2016</strong><br><br>\n\t\t \t\t\t\t\t\tLuna inoperosa e Sole e Marte ostili, che alimentano l\u2019orgoglio, con impennate che dispiacciono a chi vi vuole bene. Obiettivi arditi ma fuori portata; da soli, senza il sostegno di soci o di sponsor, non potete assolutamente farcela. Il vostro partner non \xe8 al top della forma, ma con un minimo di impegno reciproco potete affrontare con soddisfazione tutti gli ostacoli.\n\t\t \t\t\t\t\t</br></br></p>]


Now, how can I get text only, without the tags and, possibly, with correct characters (apostrophe instead of  "\u2019", i.e.)?
I guess it should be simple enough, but can't figure it out myself…
Thx,
Guido
 

David Goldsmith

unread,
Oct 28, 2016, 10:39:44 AM10/28/16
to beauti...@googlegroups.com
try hor.text.
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsoup+unsubscribe@googlegroups.com.
To post to this group, send email to beauti...@googlegroups.com.
Visit this group at https://groups.google.com/group/beautifulsoup.
For more options, visit https://groups.google.com/d/optout.


--
From "A Letter From The Future" in "Peak Everything" by Richard Heinberg:

"By the time I was an older teenager, a certain...attitude was developing among the young people...a feeling of utter contempt for anyone over a certain age--maybe 30 or 40.  The adults had consumed so many resources, and now there were none left for their own children...when those adults were younger, they [were] just doing what everybody else was doing...they figured it was normal to cut down ancient forests for...phone books, pump every last gallon of oil to power their SUV's...[but] for...my generation all that was just a dim memory...We [grew up] living in darkness, with shortages of food and water, with riots in the streets, with people begging on street corners...for us, the adults were the enemy."

Want to really understand what's really going on?  Read "Peak Everything."



David Goldsmith

unread,
Oct 28, 2016, 10:41:52 AM10/28/16
to beauti...@googlegroups.com
Sorry:

hortext = [item.text for item in hor]

DLG


On Friday, October 28, 2016, Guido Benigni <guidob...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsoup+unsubscribe@googlegroups.com.
To post to this group, send email to beauti...@googlegroups.com.
Visit this group at https://groups.google.com/group/beautifulsoup.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages