Thanks for the great work of sc2reader! As the developer of SC2Geeks, I'm using this awesome library to parse replays. It's been doing a great job, very much appreciated! Below is the duplicate of issue at https://github.com/GraylinKim/sc2reader/issues/182.
Lately, while processing the WCS 2014 S3 replay pack, I encountered some errors that prevented some of the replays (download here) from being parsed. The error message being:
ERROR:root:Traceback (most recent call last): File "sc2parser.py", line 247, in parse_replay_dict replay = sc2reader.load_replay(replayFile, load_level=2 if is_summary else 4, load_map=load_map) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 85, in load_replay return self.load(Replay, source, options, **new_options) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 137, in load return self._load(cls, resource, filename=filename, options=options) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/factories/sc2factory.py", line 146, in _load obj = cls(resource, filename=filename, factory=self, **options) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 262, in __init__ self._read_data(data_file, self._get_reader(data_file)) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/resources.py", line 592, in _read_data self.raw_data[data_file] = reader(data, self) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/readers.py", line 102, in __call__ ) for i in range(data.read_bits(5))], File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 252, in read_aligned_string return self._buffer.read_string(count, encoding) File "/opt/python/python2.7/local/lib/python2.7/site-packages/sc2reader/decoders.py", line 108, in read_string return self.read_bytes(count).decode(encoding) File "/opt/python/python2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 23: invalid start byte
The byte and position vary for different replays. The calling script is essentially invoking sc2reader to parse a given replay file.
This error can be replicated on both CentOS 6.5 and Ubuntu 14.04, Python2.7 and Python 3.4. Since it's encoding related, I double-checked and below is the output of locale:
LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
What's confusing is that there is no error when running on Mac and it'll just run through. Below is the output of locale on the mac:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL=
As GGTracker and Spawning Tools can both parse those failed replays, I'm not sure if it's a bug of sc2reader or can be resolved by tweaking the Python environment instead. Due to my limited knowledge in Python I tried one recommended approach to set the default encoding to non-utf8 for python2 but failed.
PYTHONIOENCODING="ascii" python sc2parser.py /tmp/G1.SC2Replay
I appreciate in advance for your time looking into this. Thanks!