不,这个破页面连 gb18030 都不是
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>春意盎然踏青去 国内旅游web2.0服务推荐_好站推荐_中国站长站
CHINAZ.COM</title>
<link href="/images/style.css" type="text/css" rel="stylesheet" type="text/css" />
<meta name="keywords" content="春意盎然 踏青 旅游网站 web2.0,中国站长站">
<meta name="description" content="出游前,可以在游多多上查找目的地的攻略和问答,以及吃、iconv: 580位置的非法输入序列
你没法按常规方法解码,而只能在确定 gb 编码后让 decode 忽略错误
| decode(...)
| S.decode([encoding[,errors]]) -> string or unicode
|
| Decodes S using the codec registered for encoding. encoding defaults
| to the default encoding. errors may be given to set a different error
| handling scheme. Default is 'strict' meaning that encoding errors raise
| a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
| as well as any other name registerd with codecs.register_error that is
| able to handle UnicodeDecodeErrors.