Issue #87: Use unicode.east_asian_width() to compute character widths? (blais/beancount)

55 views
Skip to first unread message

Zhuoyun Wei

unread,
Dec 5, 2015, 10:13:26 AM12/5/15
to bean...@googlegroups.com
New issue 87: Use unicode.east_asian_width() to compute character widths?
https://bitbucket.org/blais/beancount/issues/87/use-unicodeeast_asian_width-to-compute

Zhuoyun Wei:

Hi Martin,

would you please consider using `unicodedata.east_asian_width()` to compute the character widths before printing them to the terminal?

I know most Beancount users use English to write their ledgers (including me), but in some cases there are full-width characters in "Payee" or "Narration" fields. In my case, I live in China so all my bank statements are in Chinese. After importing them into Beancount, I find they do not align well (see attachments) in `bean-query`. However, in other shells (like MySQL shell), Chinese characters can be rendered and aligned well.

After some research, I found out that Beancount uses `len()` and `str.format()` to align columns. These two builtins cannot handle full-width characters like Chinese, Japanese and Korean (CJK, or more widely, "East Asian Characters", see http://unicode.org/reports/tr11/ )

Therefore, I wrote a small function to compute the correct string length, which takes full-width characters into account:


```
#!python

from unicodedata import east_asian_width as eaw

def str_width(string):
width = sum([
2 if eaw(c) in ['F', 'W'] else 1
for c in string
])
return width

```

It computes the correct width for a string, for terminal display. After rewriting some of Beancount's code, I managed to make `bean-report` outputs aligned text when processing mixed half-width / full-width characters. (still see attachments)

Here is the Beancount file I used for testing:

```
; vim: ft=beancount nofoldenable:

1970-01-01 open Income:Salary
1970-01-01 open Assets:Cash

2003-01-01 * "Foo"
Income:Salary -100 CNY
Assets:Cash +100 CNY

2005-01-01 * "汉字"
Income:Salary -200 CNY
Assets:Cash +200 CNY

2008-01-01 * "汉字foo"
Income:Salary -200 CNY
Assets:Cash +200 CNY
```

My modications are too messy (I am a Python newcomer) and I am not familer with Mericual, so I'm not opening a pull request. If you have time, please consider supporting this. (Maybe, put my tiny function in `textutils`?)

Thanks for your consideration.

Responsible: blais
Reply all
Reply to author
Forward
0 new messages