Could I get a real simple example of a script extracting some data from MySpace?

Drew Tronvig

unread,

Nov 24, 2009, 11:03:56 PM11/24/09

to beautifulsoup

I'm trying to get my head around BeautifulSoup and Python, and I
haven't been able to find any simple real-world example that refers to
a URL that currently exists. Something like the real-world example in
the the BeautifulSoup documentation right above
http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20a%20Document
would be a good start, but that target URL is long gone.

I realize that MySpace isn't the easiest HTML out there, but since
that's what I'll be working on, a relatively simple script to extract
some MySpace data and send it to a file would be great. Just
extracting some simple stuff like name and location. The script could
refer to http://www.myspace.com/tom, and then it'll remain valid for
as long as Myspace is around. Barring a MySpace example, would there
be a relatively simple example script that extracts and saves data
from a URL that's likely stay around and probably stay fairly static
in content?

Thanks,
Drew

sste...@gmail.com

unread,

Nov 25, 2009, 8:32:23 AM11/25/09

to beauti...@googlegroups.com

On Nov 24, 2009, at 11:03 PM, Drew Tronvig wrote:

> I'm trying to get my head around BeautifulSoup and Python, and I
> haven't been able to find any simple real-world example that refers to
> a URL that currently exists. Something like the real-world example in
> the the BeautifulSoup documentation right above
> http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20a%20Document
> would be a good start, but that target URL is long gone.

Could you try reading the documentation at:

http://www.crummy.com/software/BeautifulSoup/documentation.html

Then put up some sample code showing where you get stuck or what's not working the way you expect it to?

It's one thing to say "I tried this and that and can't get this to work"; it's another thing to say "could you please write my program for me?"

S

Drew Tronvig

unread,

Nov 25, 2009, 4:26:54 PM11/25/09

to beautifulsoup

Thanks ssteinerX,

I have been working my way through the documentation, and at this
point it would really help me if I had an example of a script that
actually retrieves data from an existing URL. Any URL. Any data. The
documentation includes a real-world example script. That example was
probably included in order to help newcomers get a quick overall view
of a script and how it accesses data in a URL:

[Start Quote]

Here's a real-world example. It fetches the ICC Commercial Crime
Services weekly piracy report, parses it with Beautiful Soup, and
pulls out the piracy incidents:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
where, linebreak, what = incident.contents[:3]
print where.strip()
print what.strip()
print

[end Quote]

Sadly, http://www.icc-ccs.org/prc/piracyreport.php no longer exists.
If it did, I might start to understand where "incident" lies in the
URL, and what the "incident.contents" looks like. I'm not stuck; I'm
just not started, and a working sample script on that order would help
me get started, so the next question I have will be a little better
informed.

So, is there a real-world example script that would be similar in
concept to that example script in the documentation, but which works?

Thanks,
Drew

On Nov 25, 5:32 am, "sstein...@gmail.com" <sstein...@gmail.com> wrote:
> On Nov 24, 2009, at 11:03 PM, Drew Tronvig wrote:
>
> > I'm trying to get my head around BeautifulSoup and Python, and I
> > haven't been able to find any simple real-world example that refers to
> > a URL that currently exists. Something like the real-world example in
> > the the BeautifulSoup documentation right above

> >http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsi...

sste...@gmail.com

unread,

Nov 25, 2009, 9:01:18 PM11/25/09

to beauti...@googlegroups.com

On Nov 25, 2009, at 4:26 PM, Drew Tronvig wrote:

> Thanks ssteinerX,
>
> I have been working my way through the documentation, and at this
> point it would really help me if I had an example of a script that
> actually retrieves data from an existing URL. Any URL. Any data.

How about this, that pulls out the <div> with the id "toolbar"...

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.integrateddevcorp.com/index.php")
soup = BeautifulSoup(page)
for tb in soup('div', id="toolbar"):
print tb
print

Does that help?

S

Drew Tronvig

unread,

Nov 25, 2009, 9:10:29 PM11/25/09

to beautifulsoup

Looks good, thanks.

Reply all

Reply to author

Forward