Encoding Error for certain replays

33 views
Skip to first unread message

Robert Wang

unread,
Dec 12, 2014, 5:36:58 PM12/12/14
to sc2r...@googlegroups.com

Hi Guys,

Thanks for the great work of sc2reader! As the developer of SC2Geeks, I'm using this awesome library to parse replays. It's been doing a great job, very much appreciated! Below is the duplicate of issue at https://github.com/GraylinKim/sc2reader/issues/182.

Lately, while processing the WCS 2014 S3 replay pack, I encountered some errors that prevented some of the replays (download here) from being parsed. The error message being:

ERROR:root:Traceback (most recent call last):
  File "sc2parser.py", line 247, in parse_replay_dict
    replay = sc2reader.load_replay(replayFile, load_level=2 if is_summary else 4, load_map=load_map)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 85, in load_replay
    return self.load(Replay, source, options, **new_options)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 137, in load
    return self._load(cls, resource, filename=filename, options=options)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 146, in _load
    obj = cls(resource, filename=filename, factory=self, **options)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 262, in __init__
    self._read_data(data_file, self._get_reader(data_file))
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 592, in _read_data
    self.raw_data[data_file] = reader(data, self)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/readers.py", line 102, in __call__
    ) for i in range(data.read_bits(5))],
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 252, in read_aligned_string
    return self._buffer.read_string(count, encoding)
  File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 108, in read_string
    return self.read_bytes(count).decode(encoding)
  File "/opt/python/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 23: invalid start byte

The byte and position vary for different replays. The calling script is essentially invoking sc2reader to parse a given replay file.

This error can be replicated on both CentOS 6.5 and Ubuntu 14.04, Python2.7 and Python 3.4. Since it's encoding related, I double-checked and below is the output of locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

What's confusing is that there is no error when running on Mac and it'll just run through. Below is the output of locale on the mac:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

As GGTracker and Spawning Tools can both parse those failed replays, I'm not sure if it's a bug of sc2reader or can be resolved by tweaking the Python environment instead. Due to my limited knowledge in Python I tried one recommended approach to set the default encoding to non-utf8 for python2 but failed.

PYTHONIOENCODING="ascii" python sc2parser.py /tmp/G1.SC2Replay

I appreciate in advance for your time looking into this. Thanks!

Regards,
Robert

Robert Wang

unread,
Dec 12, 2014, 8:33:38 PM12/12/14
to sc2r...@googlegroups.com
This case is closed. Please see resolution at https://github.com/GraylinKim/sc2reader/issues/182

Thanks!
Robert

Xavi Ramirez

unread,
Dec 19, 2014, 11:48:04 PM12/19/14
to sc2r...@googlegroups.com
oooi

--
Typed with thumbs and sent with love from Boomerang

-----Original Message-----
[sc2reader] Encoding Error for certain replays
From: Robert Wang <robert...@gmail.com>
To: <sc2r...@googlegroups.com>
Friday, December 12, 2014 at 2:36 PM

--
You received this message because you are subscribed to the Google Groups "sc2reader" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sc2reader+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xavi Ramirez

unread,
Dec 19, 2014, 11:48:06 PM12/19/14
to sc2r...@googlegroups.com
oooi

--
Typed with thumbs and sent with love from Boomerang

-----Original Message-----
[sc2reader] Encoding Error for certain replays
From: Robert Wang <robert...@gmail.com>
To: <sc2r...@googlegroups.com>
Friday, December 12, 2014 at 2:36 PM

--
Reply all
Reply to author
Forward
0 new messages