Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

question about perlunicode "Unicode Character Properties"

2 views
Skip to first unread message

silent

unread,
Jan 12, 2012, 2:10:27 AM1/12/12
to perl-u...@perl.org
in perldoc perlunicde : Unicode Character Properties : Scripts

I see a Han, which can be use as $string =~/\p{Han}/;

my question is how can I find out what exactly "Han" is ?
I know \p{Han} can match a Chinese word,
also tested it to match each word in perl-src/ext/Encode/t/gb2312.utf,

but I do not know the exact range of this \p{Han}.

thanks!

Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯

unread,
Jan 12, 2012, 3:47:22 AM1/12/12
to perl-u...@perl.org
Install <http://p3rl.org/unichars>.

$ unichars -au '\p{Han}' | wc -l
75960

$ unichars -au '\p{Han}' | perl -lne'print unless $. % 1000'
㚏 U+0368F CJK UNIFIED IDEOGRAPH-368F
㩷 U+03A77 CJK UNIFIED IDEOGRAPH-3A77
㹟 U+03E5F CJK UNIFIED IDEOGRAPH-3E5F
䉇 U+04247 CJK UNIFIED IDEOGRAPH-4247
䘯 U+0462F CJK UNIFIED IDEOGRAPH-462F
䨗 U+04A17 CJK UNIFIED IDEOGRAPH-4A17
义 U+04E49 CJK UNIFIED IDEOGRAPH-4E49
刱 U+05231 CJK UNIFIED IDEOGRAPH-5231
嘙 U+05619 CJK UNIFIED IDEOGRAPH-5619
威 U+05A01 CJK UNIFIED IDEOGRAPH-5A01
巩 U+05DE9 CJK UNIFIED IDEOGRAPH-5DE9
懑 U+061D1 CJK UNIFIED IDEOGRAPH-61D1
方 U+065B9 CJK UNIFIED IDEOGRAPH-65B9
榡 U+069A1 CJK UNIFIED IDEOGRAPH-69A1
涉 U+06D89 CJK UNIFIED IDEOGRAPH-6D89
煱 U+07171 CJK UNIFIED IDEOGRAPH-7171
留 U+07559 CJK UNIFIED IDEOGRAPH-7559
祁 U+07941 CJK UNIFIED IDEOGRAPH-7941
紩 U+07D29 CJK UNIFIED IDEOGRAPH-7D29
脑 U+08111 CJK UNIFIED IDEOGRAPH-8111
蓹 U+084F9 CJK UNIFIED IDEOGRAPH-84F9
裡 U+088E1 CJK UNIFIED IDEOGRAPH-88E1
賉 U+08CC9 CJK UNIFIED IDEOGRAPH-8CC9
邱 U+090B1 CJK UNIFIED IDEOGRAPH-90B1
钙 U+09499 CJK UNIFIED IDEOGRAPH-9499
颁 U+09881 CJK UNIFIED IDEOGRAPH-9881
鱩 U+09C69 CJK UNIFIED IDEOGRAPH-9C69
礪 U+0F985 CJK COMPATIBILITY IDEOGRAPH-F985
𠊗 U+20297 CJK UNIFIED IDEOGRAPH-20297
𠙿 U+2067F CJK UNIFIED IDEOGRAPH-2067F
𠩧 U+20A67 CJK UNIFIED IDEOGRAPH-20A67
𠹏 U+20E4F CJK UNIFIED IDEOGRAPH-20E4F
𡈷 U+21237 CJK UNIFIED IDEOGRAPH-21237
𡘟 U+2161F CJK UNIFIED IDEOGRAPH-2161F
𡨇 U+21A07 CJK UNIFIED IDEOGRAPH-21A07
𡷯 U+21DEF CJK UNIFIED IDEOGRAPH-21DEF
𢇗 U+221D7 CJK UNIFIED IDEOGRAPH-221D7
𢖿 U+225BF CJK UNIFIED IDEOGRAPH-225BF
𢦧 U+229A7 CJK UNIFIED IDEOGRAPH-229A7
𢶏 U+22D8F CJK UNIFIED IDEOGRAPH-22D8F
𣅷 U+23177 CJK UNIFIED IDEOGRAPH-23177
𣕟 U+2355F CJK UNIFIED IDEOGRAPH-2355F
𣥇 U+23947 CJK UNIFIED IDEOGRAPH-23947
𣴯 U+23D2F CJK UNIFIED IDEOGRAPH-23D2F
𤄗 U+24117 CJK UNIFIED IDEOGRAPH-24117
𤓿 U+244FF CJK UNIFIED IDEOGRAPH-244FF
𤣧 U+248E7 CJK UNIFIED IDEOGRAPH-248E7
𤳏 U+24CCF CJK UNIFIED IDEOGRAPH-24CCF
𥂷 U+250B7 CJK UNIFIED IDEOGRAPH-250B7
𥒟 U+2549F CJK UNIFIED IDEOGRAPH-2549F
𥢇 U+25887 CJK UNIFIED IDEOGRAPH-25887
𥱯 U+25C6F CJK UNIFIED IDEOGRAPH-25C6F
𦁗 U+26057 CJK UNIFIED IDEOGRAPH-26057
𦐿 U+2643F CJK UNIFIED IDEOGRAPH-2643F
𦠧 U+26827 CJK UNIFIED IDEOGRAPH-26827
𦰏 U+26C0F CJK UNIFIED IDEOGRAPH-26C0F
𦿷 U+26FF7 CJK UNIFIED IDEOGRAPH-26FF7
𧏟 U+273DF CJK UNIFIED IDEOGRAPH-273DF
𧟇 U+277C7 CJK UNIFIED IDEOGRAPH-277C7
𧮯 U+27BAF CJK UNIFIED IDEOGRAPH-27BAF
𧾗 U+27F97 CJK UNIFIED IDEOGRAPH-27F97
𨍿 U+2837F CJK UNIFIED IDEOGRAPH-2837F
𨝧 U+28767 CJK UNIFIED IDEOGRAPH-28767
𨭏 U+28B4F CJK UNIFIED IDEOGRAPH-28B4F
𨼷 U+28F37 CJK UNIFIED IDEOGRAPH-28F37
𩌟 U+2931F CJK UNIFIED IDEOGRAPH-2931F
𩜇 U+29707 CJK UNIFIED IDEOGRAPH-29707
𩫯 U+29AEF CJK UNIFIED IDEOGRAPH-29AEF
𩻗 U+29ED7 CJK UNIFIED IDEOGRAPH-29ED7
𪊿 U+2A2BF CJK UNIFIED IDEOGRAPH-2A2BF
𪚧 U+2A6A7 CJK UNIFIED IDEOGRAPH-2A6A7
𪪸 U+2AAB8 CJK UNIFIED IDEOGRAPH-2AAB8
𪺠 U+2AEA0 CJK UNIFIED IDEOGRAPH-2AEA0
𫊈 U+2B288 CJK UNIFIED IDEOGRAPH-2B288
𫙰 U+2B670 CJK UNIFIED IDEOGRAPH-2B670


To get a good coverage for display, install the following font families:

文泉驿正黑 <http://wenq.org/?ZenHei>
Han Nom <http://vietunicode.sf.net/fonts/fonts_hannom.html>
Code200x <http://web.archive.org/web/2010/http://code2000.net/>
signature.asc
0 new messages