Title gets extracted from an embedded SVG

52 views
Skip to first unread message

Tea Fan

unread,
May 8, 2025, 8:12:47 AMMay 8
to beautifulsoup
Hi,

I have a basic HTML parser working on top of Beautiful Soup library. Recently I noticed that some output texts have odd titles which do not match the content.

For example this page:
https://www.cancer.gov/publications/dictionaries/cancer-terms/def/seizure

The page above doesn't have a title tag in the head section. However it does have multiple SVG elements which have title tags in them. It looks like Beautiful Soup extracts the title from those SVGs.

I put up a small reproducer:
```
from bs4 import BeautifulSoup

test_html ="""<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta property="og:title" content="NCI Dictionary of Cancer Terms" />
  </head>
  <body>
    <svg xmlns="http://www.w3.org/2000/svg">
      <title id="facebook-title">Facebook</title>
    </svg>
  </body>
</html>
"""

soup = BeautifulSoup(test_html, 'html.parser')
title = soup.title.string if soup.title and soup.title.string else "N/A"
print(f"Title: {title}")
```
The script prints "Title: Facebook" instead of "Title: N/A"

In my parser, I added some code which iterates over node parents and checks whether the tag came from "head" or "body" section. However it seems odd to do that.

Is it a bug or or a feature?

Isaac Muse

unread,
May 8, 2025, 8:40:13 AMMay 8
to beautifulsoup

It looks like it is working as expected. When you specify something like soup.property, if property is not a defined function or attribute on the object, it is treated as if you did something like soup.find('property'), which will find the first tag under the current parent. In your case, it finds the first “title” tag under the document root, which happens to be the “title” tag under the “svg” tag.

Tea Fan

unread,
May 8, 2025, 9:12:20 AMMay 8
to beautifulsoup
Thank you for clarifying this. It makes more sense now :)

I replaced `soup.title` with `soup.head.title` in the example above, and it started to work as I initially expected.
Reply all
Reply to author
Forward
0 new messages