Emoji in Venmo CSV input to v2 ingest fails

55 views
Skip to first unread message

Liam Hupfer

unread,
Jan 6, 2022, 12:43:00 PM1/6/22
to bean...@googlegroups.com

Hi Beancounters,

I’m attempting to write a Venmo importer based on Red Street’s importer framework, and I’ve run into an issue with Venmo CSV’s that contain emoji characters. bean-identify results in a decoding error raised from cache.py in v2’s ingest. I checked GitHub and it appears there was a PR in April that may fix this. Unfortunately, there hasn’t been a Beancount 2 release since March. I attempted pip uninstall beancount; pip install 'git+https://github.com/beancount/beancount.git@v2', as well as pip uninstall beancount; pip install 'https://github.com/beancount/beancount/archive/refs/heads/v2.zip' but I get the same issue. Am I doing something wrong with the pip installation of the v2 branch? Or is this still an issue?

ERROR:root:Importer importers.venmo.Importer.identify() raised an unexpected error: 'charmap' codec can't decode byte 0x8f in position 1260: character maps to <undefined>
Traceback (most recent call last):
  File "/path/.venv/lib/python3.9/site-packages/beancount/ingest/cache.py", line 51, in convert
    result = self._cache[converter_func]
KeyError: <function head.<locals>.head_reader at 0x7f3c3a851b80>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/.venv/lib/python3.9/site-packages/beancount/ingest/identify.py", line 63, in find_imports
    matched = importer.identify(file)
  File "/path/.venv/lib/python3.9/site-packages/beancount_reds_importers/libreader/reader.py", line 17, in identify
    self.initialize_reader(file)
  File "/path/.venv/lib/python3.9/site-packages/beancount_reds_importers/libreader/csvreader.py", line 65, in initialize_reader
    self.reader_ready = re.match(self.header_identifier, file.head())
  File "/path/.venv/lib/python3.9/site-packages/beancount/ingest/cache.py", line 64, in head
    return self.convert(head(num_bytes, encoding=encoding))
  File "/path/.venv/lib/python3.9/site-packages/beancount/ingest/cache.py", line 55, in convert
    result = self._cache[converter_func] = converter_func(self.name)
  File "/path/.venv/lib/python3.9/site-packages/beancount/ingest/cache.py", line 101, in head_reader
    return next(decoder)
  File "/nix/store/rppr9s436950i1dlzknbmz40m2xqqnxc-python3-3.9.9/lib/python3.9/codecs.py", line 1054, in iterdecode
    output = decoder.decode(input)
  File "/nix/store/rppr9s436950i1dlzknbmz40m2xqqnxc-python3-3.9.9/lib/python3.9/encodings/cp1254.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1260: character maps to <undefined>

—Liam

Daniele Nicolodi

unread,
Jan 13, 2022, 4:19:15 PM1/13/22
to bean...@googlegroups.com
On 06/01/2022 18:18, 'Liam Hupfer' via Beancount wrote:
> Hi Beancounters,
>
> I’m attempting to write a Venmo importer based on Red Street’s importer
> framework, and I’ve run into an issue with Venmo CSV’s that contain
> emoji characters. |bean-identify| results in a decoding error raised
> from |cache.py| in v2’s |ingest|. I checked GitHub and it appears there
> was a PR in April <https://github.com/beancount/beancount/pull/646> that
> may fix this. Unfortunately, there hasn’t been a Beancount 2 release
> since March. I attempted |pip uninstall beancount; pip install
> 'git+https://github.com/beancount/beancount.git@v2'|, as well as |pip
> uninstall beancount; pip install
> 'https://github.com/beancount/beancount/archive/refs/heads/v2.zip'| but
> I get the same issue. Am I doing something wrong with the pip
> installation of the v2 branch? Or is this still an issue?


It seems that your importer is trying to read the CSV in some charmap
encoding (I suspect cp1252) and that fails. Are you sure you are using
the right encoding? Most likely the CSV is in UTF8 (or some other
Unicode capable encoding that can represent Emojis, AFAIK no charmap
encoding can do that).

This does not seem related to the bug solved by PR #646.

Cheers,
Dan
Reply all
Reply to author
Forward
0 new messages