Hi, I am trying to to parse this url:
Using BeautifulSoup(data, 'html.parser') the content all looks like this:
/Mw==','w6xpw6bCnMOmfsKr','wobCrcKu','w6nDrMKAZWU=','w4HDkhDDvsKeYcOj','ccOwUVbDkQ==','RVobC8OHPQw=','wpJGfMOn','w4HDuz
But when I downloaded the file to my local drive to run the diagnose function, it worked fine:
Diagnostic
running on Beautiful Soup 4.10.0
Python version 3.8.3 (default, Jul 2 2020, 11:26:31)
[Clang 10.0.0 ]
Found lxml version 4.6.3.0
Found html5lib version 1.1
Trying to parse your markup with html.parser
Here's what html.parser did with the markup:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- saved from
url=(0067)http://www.pbc.gov.cn/en/3688229/3688335/3730267/4380809/index.html
-->
<html>
<head>
<meta content="text/html; charset=utf-8"
http-equiv="Content-Type"/>
<link href="./Announcement on Open Market Operations No.216
[2021]_files/default.css" id="lhgdialoglink" rel="stylesheet"/>
<title>
Announcement on Open Market Operations No.216 [2021]
</title>
<meta content="2021-11-05 11:00:11" name="页面生成时间"/>
<meta content="2021-10-21 17:38:26" name="缓存清理时间"/>
<meta content="7.9.5" name="easysite版本"/>
<meta content="Announcement on Open Market Operations No.216 2021"
name="keywords"/>
<meta content="Announcement on Open Market Operations No.216 2021
初始化频
I was hoping to find out why I couldn't read direct from the url, and if there is a way to resolve that.
Thank you.