I'd like you to do a code review. Please review the following patch:
----------------------------------------------------------------------
r1315: (no author) | 2010-03-04 17:04:45 +0800
Decode XML files declared GB2312 encoding as GB18030 or GBK to prevent decoding errors.
----------------------------------------------------------------------
r1316: (no author) | 2010-03-04 17:05:39 +0800
----------------------------------------------------------------------
=== extensions/libxml2_xml_parser/libxml2_xml_parser.cc
==================================================================
--- extensions/libxml2_xml_parser/libxml2_xml_parser.cc (revision 1314)
+++ extensions/libxml2_xml_parser/libxml2_xml_parser.cc (revision 1316)
@@ -113,7 +113,19 @@
if (content.empty())
return true;
- xmlCharEncodingHandler *handler = xmlFindCharEncodingHandler(encoding);
+ xmlCharEncodingHandler *handler;
+ if (strcasecmp(encoding, "GB2312") == 0) {
+ // Many XML documents declared GB2312 actually contain characters out
+ // of GB2312 range. Use GB18030 or GBK to prevent decoding errors.
+ handler = xmlFindCharEncodingHandler("GB18030");
+ if (!handler) {
+ // In case that GB18030 is not supported.
+ handler = xmlFindCharEncodingHandler("GBK");
+ }
+ } else {
+ handler = xmlFindCharEncodingHandler(encoding);
+ }
+
if (!handler)
return false;
This is a semiautomated message from "svkmail". Complaints or suggestions?
Mail edy...@gmail.com.