Hi Guys,
Thanks for the great work of sc2reader! As the developer of SC2Geeks, I'm using this awesome library to parse replays. It's been doing a great job, very much appreciated! Below is the duplicate of issue at https://github.com/GraylinKim/sc2reader/issues/182.
Lately, while processing the WCS 2014 S3 replay pack, I encountered some errors that prevented some of the replays (download here) from being parsed. The error message being:
ERROR:root:Traceback (most recent call last):
File "sc2parser.py", line 247, in parse_replay_dict
replay = sc2reader.load_replay(replayFile, load_level=2 if is_summary else 4, load_map=load_map)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 85, in load_replay
return self.load(Replay, source, options, **new_options)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 137, in load
return self._load(cls, resource, filename=filename, options=options)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 146, in _load
obj = cls(resource, filename=filename, factory=self, **options)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 262, in __init__
self._read_data(data_file, self._get_reader(data_file))
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 592, in _read_data
self.raw_data[data_file] = reader(data, self)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/readers.py", line 102, in __call__
) for i in range(data.read_bits(5))],
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 252, in read_aligned_string
return self._buffer.read_string(count, encoding)
File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 108, in read_string
return self.read_bytes(count).decode(encoding)
File "/opt/python/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 23: invalid start byte
The byte and position vary for different replays. The calling script is essentially invoking sc2reader to parse a given replay file.
This error can be replicated on both CentOS 6.5 and Ubuntu 14.04, Python2.7 and Python 3.4. Since it's encoding related, I double-checked and below is the output of locale:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
What's confusing is that there is no error when running on Mac and it'll just run through. Below is the output of locale on the mac:
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
As GGTracker and Spawning Tools can both parse those failed replays, I'm not sure if it's a bug of sc2reader or can be resolved by tweaking the Python environment instead. Due to my limited knowledge in Python I tried one recommended approach to set the default encoding to non-utf8 for python2 but failed.
PYTHONIOENCODING="ascii" python sc2parser.py /tmp/G1.SC2Replay
I appreciate in advance for your time looking into this. Thanks!
Regards,
Robert