[feedparser commit] r290 - in trunk/feedparser: . tests/wellformed/encoding

2 views
Skip to first unread message

codesite...@google.com

unread,
Mar 15, 2008, 4:14:53 PM3/15/08
to feedparse...@googlegroups.com
Author: pilgrim
Date: Thu Mar 13 21:56:33 2008
New Revision: 290

Added:
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset.xml
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset_overrides_encoding.xml
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_encoding.xml
Modified:
trunk/feedparser/feedparser.py

Log:
coerce gb2312 to gb18030 [closes issue 16/1573544]

Modified: trunk/feedparser/feedparser.py
==============================================================================
--- trunk/feedparser/feedparser.py (original)
+++ trunk/feedparser/feedparser.py Thu Mar 13 21:56:33 2008
@@ -3246,6 +3246,10 @@
true_encoding = xml_encoding or 'iso-8859-1'
else:
true_encoding = xml_encoding or 'utf-8'
+ # some feeds claim to be gb2312 but are actually gb18030.
+ # apparently MSIE and Firefox both do the following switch:
+ if true_encoding.lower() == 'gb2312':
+ true_encoding = 'gb18030'
return true_encoding, http_encoding, xml_encoding,
sniffed_xml_encoding, acceptable_content_type

def _toUTF8(data, encoding):

Added: trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset.xml
==============================================================================
--- (empty file)
+++
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset.xml
Thu Mar 13 21:56:33 2008
@@ -0,0 +1,8 @@
+<?xml version="1.0"?>
+<!--
+Header: Content-type: application/atom+xml;charset='gb2312'
+Description: application/atom+xml + explicit charset
+Expect: not bozo and encoding == 'gb18030'
+-->
+<feed xmlns="http://www.w3.org/2005/Atom">
+</feed>

Added: trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset_overrides_encoding.xml
==============================================================================
--- (empty file)
+++
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_charset_overrides_encoding.xml
Thu Mar 13 21:56:33 2008
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!--
+Header: Content-type: application/atom+xml; charset='gb2312'
+Description: application/atom+xml + charset overrides encoding
+Expect: not bozo and encoding == 'gb18030'
+-->
+<feed xmlns="http://www.w3.org/2005/Atom">
+</feed>

Added: trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_encoding.xml
==============================================================================
--- (empty file)
+++
trunk/feedparser/tests/wellformed/encoding/http_application_atom_xml_gb2312_encoding.xml
Thu Mar 13 21:56:33 2008
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="gb2312"?>
+<!--
+Header: Content-type: application/atom+xml
+Description: application/atom+xml + explicit encoding
+Expect: not bozo and encoding == 'gb18030'
+-->
+<feed xmlns="http://www.w3.org/2005/Atom">
+</feed>

Reply all
Reply to author
Forward
0 new messages