[py26]glob 的 Unicode 編碼問題

Chui-Wen Chiu

unread,

Feb 10, 2010, 4:32:07 AM2/10/10

to python.tw

我的環境是 Windows Vista + Python 2.6.1
我在列舉目錄列表時，遇到一個編碼問題
目錄：D:\result\dst
\diamondTearz_>>_Blog_Archive_>>_Unity_3D_for_Flash_Developers_
程式
print locale.getdefaultlocale()
print sys.getfilesystemencoding()

for fn in os.listdir('d:\\result\\dst\\'):
print fn

for Path, Folder, FileName in os.walk('d:\\result\\dst\\*'):
print FileName

for f in glob.glob('d:\\result\\dst\\*'):
f = f+'\\index.html'
print f
print f.decode('utf-8')
print f.decode('cp950')

執行結果
('zh_TW', 'cp950')
mbcs
diamondTearz_?_Blog_Archive_?_Unity_3D_for_Flash_Developers_
d:\result\dst\diamondTearz_?_Blog_Archive_?
_Unity_3D_for_Flash_Developers_\index.html
d:\result\dst\diamondTearz_?_Blog_Archive_?
_Unity_3D_for_Flash_Developers_\index.html
d:\result\dst\diamondTearz_?_Blog_Archive_?
_Unity_3D_for_Flash_Developers_\index.html

我發現只要遇到 >> 這個字元， python 就無法正常解析，只會呈現 ?
有什麼辦法可以解決？

Yung-Yu Chen

unread,

Feb 10, 2010, 7:44:38 AM2/10/10

to pyth...@googlegroups.com

先試著輸出到檔案看看。有時候不是 read/decode 出問題，而是 write/encode 到 terminal 的時候出問題。Windows 的 terminal 很弱。不要把 glob 的結果直接丟給 terminal，改放到檔案裡。看這樣能不能找到一些線索。

yyc

2010/2/10 Chui-Wen Chiu <sisimi...@gmail.com>

--
您已訂閱「Google 網上論壇」的「python.tw」群組，因此我們特別傳送這封郵件通知您。
如要在此群組張貼留言，請傳送電子郵件至 pyth...@googlegroups.com。
如要取消訂閱此群組，請傳送電子郵件至 pythontw+u...@googlegroups.com。
如需更多選項，請造訪此群組：http://groups.google.com/group/pythontw?hl=zh-TW。

Eric Huang

unread,

Feb 10, 2010, 7:54:25 AM2/10/10

to pyth...@googlegroups.com

also can try :
repr(problem_str)

2010/2/10 Yung-Yu Chen <yun...@gmail.com>

--
founder of Jmap.
facebook: http://www.facebook.com/erichuang2009
twitter: http://www.twitter.com/erichuang623
web: http://www.frienzplay.com

Chui-Wen Chiu

unread,

Feb 10, 2010, 10:43:53 PM2/10/10

to python.tw

正確的來說，我原來的程式是要迭代開啟檔案，如下：

for f in glob.glob('d:\\result\\dst\\*'):

content=codecs.open(f, 'r', 'cp950').read()

open 遇到包含特定文字就 Game Over 啦

如果改用 Python 3 則會得到

UnicodeEncodeError: 'cp950' codec can't encode character '\xbb' in
position 15:
illegal multibyte sequence

從 Py3 的錯誤訊息似乎是說字串中包含兩種編碼....

補充：
1. 我觀看 glob 的原始碼是用 os.listdir 實作， os.listdir 本身則呼叫作業系統相依的 listdir 全域函
數....
2. 程式我有分別加上 # -*- coding: utf-8 -*- 或 # -*- coding: cp950 -*- 仍無效

On 2月10日, 下午8時44分, Yung-Yu Chen <yung...@gmail.com> wrote:
> 先試著輸出到檔案看看。有時候不是 read/decode 出問題，而是 write/encode 到 terminal 的時候出問題。Windows 的
> terminal 很弱。不要把 glob 的結果直接丟給 terminal，改放到檔案裡。看這樣能不能找到一些線索。
>
> yyc
>

> 2010/2/10 Chui-Wen Chiu <sisimi.pch...@gmail.com>

> > 如要取消訂閱此群組，請傳送電子郵件至 pythontw+u...@googlegroups.com<pythontw%2Bunsu...@googlegroups.com>
> > 。
> > 如需更多選項，請造訪此群組：http://groups.google.com/group/pythontw?hl=zh-TW。
>
>

Yung-Yu Chen

unread,

Feb 10, 2010, 10:48:29 PM2/10/10

to pyth...@googlegroups.com

Check http://blog.seety.org/everydaywork/2008/11/23/1064/

yyc

2010/2/10 Chui-Wen Chiu <sisimi...@gmail.com>

正確的來說，我原來的程式是要迭代開啟檔案，如下：

如要取消訂閱此群組，請傳送電子郵件至 pythontw+u...@googlegroups.com。
如需更多選項，請造訪此群組：http://groups.google.com/group/pythontw?hl=zh-TW。

Reply all

Reply to author

Forward