> I've written code to open every page that I need to open using the soup and
> next I need to retrieve the information from it.
> This is an example of the page I would be examing:
>
http://www.iucnredlist.org/details/39780/0
> I need two bits of data from the assesement box. Firstly I need its current
> status then I need its history.
>
> I can't work out how to do this becuase I need the actual text, I need it in
> the right order and the tags aren't unique to that section. Can anyone help
> me?
If a tag has no distinguishing features, you can find a nearby tag
that does have distinguishing features. Then you can use a method like
find_all() or find_next() to find the tag you're looking for, relative
to the tag that was easy to find.
For instance, you can find the assessment table relative to the <h2>
tag that comes before it.
>>> assessment_section = soup.find('h2', id='sectionAssessment')
>>> assessment_table = assessment_section.find_next('table')
Then, you can find the sections you're interested in based on their
human-readable labels.
>>> criteria_label = assessment_table.find(text="Red List Category & Criteria:")
>>> criteria_value = criteria_label.find_next('td')
>>> history_label = assessment_table.find(text="History:")
>>> history_value = history_label.find_next('td')
Leonard