> So the titles are in a blockquote tag, and usually in <p><strong>.
> But for example the last one doesn't have <strong> but <b>
> My idea was to go deeper and search for <a> tag with the correct
> title.
> For example: link = soup.findAll(name='a', text='HET BABELVIRUS')
> [0].findParent() gives me
> <a href="img/stephenson_n_babelvirus_1995_1.jpg">HET BABELVIRUS</a>
>
> But how can I now search further based on this?
> I would like to find the rest of the paragraph:
> <p><strong><a href="img/stephenson_n_babelvirus_1995_1.jpg">HET
> BABELVIRUS</a></strong><br>
> 1995, Amsterdam: Luitingh-Sijthoff, 447pag., ISBN 90-245-1217-4<br>
> vert.van: <a href="img/stephenson_n_eng_snowcrash_1993.jpg">Snow
> Crash</a> (1992), vert.door: Alistair Schuchart</p>
>
> And if I find it, can BeautifulSoup help me parse the <br> tag? Or
> should I try that with a regex?
Since you found the <a> tag inside the paragraph, you can find the
paragraph with a_tag.findParent('p')
<p><strong><a href="img/stephenson_n_babelvirus_1995_1.jpg">HET
BABELVIRUS</a></strong><br />
1995, Amsterdam: Luitingh-Sijthoff, 447pag., ISBN
90-245-1217-4<br />
vert.van: <a href="img/stephenson_n_eng_snowcrash_1993.jpg">Snow
Crash</a> (1992), vert.door: Alistair Schuchart</p>
I'm not sure what you mean by parsing the <br> tag. BS has parsed the
<br> tags and knows that there's a <br> tag, then some text, then
another <br> tag, then more text. But it doesn't know that "1995,
Amsterdam: Luitingh-Sijthoff, 447pag., ISBN 90-245-1217-4" is five
pieces of information. To express the structure of that text you
should use a regex.
Hope this helps,
Leonard