Python 3.4 字串解碼問題

465 views
Skip to first unread message

Bobby Ho

unread,
Jun 27, 2015, 11:14:02 PM6/27/15
to pyth...@googlegroups.com
我試著把Google Search的網頁原始碼Print、寫入檔案,

#coding=utf-8
try:
   
from urllib.request import Request, urlopen  # Python 3
except:
   
from urllib2 import Request, urlopen  # Python 2

useragent
= 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'

#Generate URL
url
= 'https://www.google.com.tw/search?q='
query
= str(input('Google It! :'))
full_url
= url+query


#Request Data
data
= Request(full_url)
data
.add_header('User-Agent', useragent)
dataRequested
= urlopen(data).read()
dataRequested
= str(dataRequested.decode('utf-8'))


print(dataRequested)

#Write Data Into File
file
= open('Google - '+query+'.html', 'w')
file
.write(dataRequested)


Print的時候沒問題,但是寫入檔案會Error

    file.write(dataRequested)
UnicodeEncodeError: 'cp950' codec can't encode character '\u200e' in position 97658: illegal multibyte sequence

而且我明明用UTF-8,為何會有CP950 =_=
試著把\u200e這個字符Replace成空白,但是會有更多無法Encode的字符出來....

請各位大神幫忙了!

兩大類

unread,
Jun 28, 2015, 12:22:05 AM6/28/15
to pyth...@googlegroups.com
如果我沒記錯的話,open 這個函式預設是以平台的編碼開檔的,所以指定參數 encoding='utf-8' 應該就可以了 @@

--
這是 Google 網上論壇針對「python.tw」群組發送的訂閱通知郵件。
如要取消訂閱這個群組並停止接收來自這個群組的郵件,請傳送電子郵件到 pythontw+u...@googlegroups.com
如要在這個群組張貼留言,請傳送電子郵件到 pyth...@googlegroups.com
請前往以下網址造訪這個群組:http://groups.google.com/group/pythontw
如需更多選項,請前往:https://groups.google.com/d/optout

Reply all
Reply to author
Forward
0 new messages