请问怎样得到一个文件的编码？

Henotii

unread,

Jan 14, 2006, 11:33:48 AM1/14/06

to pyth...@googlegroups.com

很多编辑器都可以显示一个文件的编码方式
不知道python中要怎样实现？

以后想要全面转化到utf-8上面了
但在简体xp上，其默认新建的文本类型文件似乎都是gb2312的

要是能给我喜欢的编辑器加上自动判断编码并转换为utf-8的功能就好了
(不知道newedit有没有这样的功能:)

limodou

unread,

Jan 15, 2006, 4:55:22 AM1/15/06

to pyth...@googlegroups.com

在 06-1-15，Henotii<hen...@gmail.com> 写道：

已经有了。在NewEdit中我已经说明这个功能了。

--
I like python!
My Blog: http://www.donews.net/limodou
NewEdit Maillist: http://groups.google.com/group/NewEdit

HuangJiahua

unread,

Jan 15, 2006, 11:11:16 AM1/15/06

to python.cn

我用的 zh2utf8.py

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# Author: Huang Jiahua <jhuang...@gmail.com>
"""Auto converter encodings to utf8

It will test utf8,gbk,big5,jp,kr to converter"""
#测试的编码类型
encc=''
def zh2utf8(stri):
"""Auto converter encodings to utf8

It will test utf8,gbk,big5,jp,kr to converter"""
global encc
for c in ('utf-8', 'gbk', 'big5', 'jp',
'euc_kr','utf16','utf32'):
encc = c
try:
return stri.decode(c).encode('utf8')
except:
pass
encc = 'unk'
return stri

if __name__=="__main__":
# 命令行测试
import sys
## sys.setappdefaultencoding('unicode')
if len(sys.argv) > 1:
stri = sys.argv[1]
else:
stri = sys.stdin.read()
print zh2utf8(stri)
print 'encc:',encc

Henotii

unread,

Jan 15, 2006, 11:40:38 AM1/15/06

to pyth...@googlegroups.com

这个办法好：)
简单实用
2006/1/16, HuangJiahua <jhuang...@gmail.com>:

张骏

unread,

Jan 15, 2006, 8:23:17 PM1/15/06

to pyth...@googlegroups.com

这种方法有问题的。部分汉字的gbk编码同utf-8有重叠，也就是既可以解析为utf8(乱码)也是正常的汉字。

具体的汉字是什么我忘了，好像还是比较常用的。

--
张骏 <zha...@foreseen-info.com>

敏捷来自Python
简单源于我们
丰元信信息技术有限公司

Reply all

Reply to author

Forward