Python 3 & decompressing gzipped Darwin Push Port messages

958 views
Skip to first unread message

Konrad Komorowski

unread,
Mar 19, 2019, 5:05:03 PM3/19/19
to A gathering place for the Open Rail Data community
This is the string I received from the Darwin Push Port:

>>> first_message
"\x1f�\x08\x00\x00\x00\x00\x00\x00\x00��Qo�0\x10ǿ����8NH@!UרS�N����/�\x05V�D0;�����\x06´\x17ʃuw���g����SI�\nL����G��2�\x17�!�o��dE�AY�ԕ��\x1fe��f��5 q��D�X��5M3ţ,�9���4�'\x06�\x05�[sܻ|v�>mEaef#�IvT�u\t�,z��\x1aa�\x13�#\x08�\x1aN\x12�3\r\x11�q\x08�I��k�_,���嘗�+��2F\x1eԐ�\x1a�IA\x16�c�\x00��`,�\x15r\x05\x03\x04��0�[�r��#\U0010f944��\n���\x7f�q��:��)�a�5�trt�0�x0�Ą\x07)���y���\x0f\x02�>�ݠ�n�6k\x1b\x13[�\x12�+\x14��şv\t%�~Ye0�\x1627r\x12��\x05wۈz�ǹX\n\x07J\x13\x02E�\x1e�\t\x1e�#g�\x07��K���1&����\\��/:�v\x04�������{�NI��չ\x08�\x08�wq�֝SR߶�v\x17oq\x12�(�����:�@\x16ѭ���(k�sU�g�'�.%n��uvkުvn����n���6\x7f\x01\x0c]q��\x04\x00\x00"


Encoded into bytes, assuming `utf-8` encoding:

>>> first_message_bytes = bytes(first_message, 'utf-8')
>>> first_message_bytes
b
"\x1f\xef\xbf\xbd\x08\x00\x00\x00\x00\x00\x00\x00\xef\xbf\xbd\xef\xbf\xbdQo\xef\xbf\xbd0\x10\xc7\xbf\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd8NH@!U\xd7\xa8S\xef\xbf\xbdN\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd/\xef\xbf\xbd\x05V\xef\xbf\xbdD0;\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x06\xc2\xb4\x17\xca\x83uw\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdg\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdSI\xef\xbf\xbd\nL\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdG\xef\xbf\xbd\xef\xbf\xbd2\xef\xbf\xbd\x17\xef\xbf\xbd!\xef\xbf\xbdo\xef\xbf\xbd\xef\xbf\xbddE\xef\xbf\xbdAY\xef\xbf\xbd\xd4\x95\xef\xbf\xbd\xef\xbf\xbd\x1fe\xef\xbf\xbd\xef\xbf\xbdf\xef\xbf\xbd\xef\xbf\xbd5 q\xef\xbf\xbd\xef\xbf\xbdD\xef\xbf\xbdX\xef\xbf\xbd\xef\xbf\xbd5M3\xc5\xa3,\xef\xbf\xbd9\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd4\xef\xbf\xbd'\x06\xef\xbf\xbd\x05\xef\xbf\xbd[s\xdc\xbb|v\xef\xbf\xbd>mEaef#\xef\xbf\xbdIvT\xef\xbf\xbdu\t\xef\xbf\xbd,z\xef\xbf\xbd\xef\xbf\xbd\x1aa\xef\xbf\xbd\x13\xef\xbf\xbd#\x08\xef\xbf\xbd\x1aN\x12\xef\xbf\xbd3\r\x11\xef\xbf\xbdq\x08\xef\xbf\xbdI\xef\xbf\xbd\xef\xbf\xbdk\xef\xbf\xbd_,\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xe5\x98\x97\xef\xbf\xbd+\xef\xbf\xbd\xef\xbf\xbd2F\x1e\xd4\x90\xef\xbf\xbd\x1a\xef\xbf\xbdIA\x16\xef\xbf\xbdc\xef\xbf\xbd\x00\xef\xbf\xbd\xef\xbf\xbd`,\xef\xbf\xbd\x15r\x05\x03\x04\xef\xbf\xbd\xef\xbf\xbd0\xef\xbf\xbd[\xef\xbf\xbdr\xef\xbf\xbd\xef\xbf\xbd#\xf4\x8f\xa5\x84\xef\xbf\xbd\xef\xbf\xbd\n\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x7f\xef\xbf\xbdq\xef\xbf\xbd\xef\xbf\xbd:\xef\xbf\xbd\xef\xbf\xbd)\xef\xbf\xbda\xef\xbf\xbd5\xef\xbf\xbdtrt\xef\xbf\xbd0\xef\xbf\xbdx0\xef\xbf\xbd\xc4\x84\x07)\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdy\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x0f\x02\xef\xbf\xbd>\xef\xbf\xbd\xdd\xa0\xef\xbf\xbdn\xef\xbf\xbd6k\x1b\x13[\xef\xbf\xbd\x12\xef\xbf\xbd+\x14\xef\xbf\xbd\xef\xbf\xbd\xc5\x9fv\t%\xef\xbf\xbd~Ye0\xef\xbf\xbd\x1627r\x12\xef\xbf\xbd\xef\xbf\xbd\x05w\xdb\x88z\xef\xbf\xbd\xc7\xb9X\n\x07J\x13\x02E\xef\xbf\xbd\x1e\xef\xbf\xbd\t\x1e\xef\xbf\xbd#g\xef\xbf\xbd\x07\xef\xbf\xbd\xef\xbf\xbdK\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd1&\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\\\xef\xbf\xbd\xef\xbf\xbd/:\xef\xbf\xbdv\x04\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd{\xef\xbf\xbdNI\xef\xbf\xbd\xef\xbf\xbd\xd5\xb9\x08\xef\xbf\xbd\x08\xef\xbf\xbdwq\xef\xbf\xbd\xd6\x9dSR\xdf\xb6\xef\xbf\xbdv\x17oq\x12\xef\xbf\xbd(\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd:\xef\xbf\xbd@\x16\xd1\xad\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd(k\xef\xbf\xbdsU\xef\xbf\xbdg\xef\xbf\xbd'\xef\xbf\xbd.%n\xef\xbf\xbd\xef\xbf\xbduvk\xde\xaavn\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdn\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd6\x7f\x01\x0c]q\xef\xbf\xbd\xef\xbf\xbd\x04\x00\x00"

Now I'm following this pretty comprehensive StackOverflow answer for decoding gzipped objects ( https://stackoverflow.com/a/22310760 ).

This should trigger header detection and work for gzip and zlib, but doesn't:

>>> import zlib
>>> zlib.decompress(first_message_bytes, zlib.MAX_WBITS|32)
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
zlib
.error: Error -3 while decompressing data: incorrect header check

This should work strictly for gzip, but doesn't:

>>> zlib.decompress(first_message_bytes, zlib.MAX_WBITS|16)
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
zlib
.error: Error -3 while decompressing data: incorrect header check

I also attempted to do it in a way similar to the Python 2 example ( https://github.com/openraildata/stomp-client-python/blame/f0807b6c9ad83a728c37605eb8ba86876c1ba2f2/README.md#L22 ), but this doesn't work either:

>>> import gzip, io
>>> gzip.GzipFile(fileobj=io.StringIO(first_message)).readlines()
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "/usr/local/lib/python3.7/gzip.py", line 374, in readline
   
return self._buffer.readline(size)
 
File "/usr/local/lib/python3.7/_compression.py", line 68, in readinto
    data
= self.read(len(byte_view))
 
File "/usr/local/lib/python3.7/gzip.py", line 463, in read
   
if not self._read_gzip_header():
 
File "/usr/local/lib/python3.7/gzip.py", line 406, in _read_gzip_header
    magic
= self._fp.read(2)
 
File "/usr/local/lib/python3.7/gzip.py", line 91, in read
   
self.file.read(size-self._length+read)
TypeError: can't concat str to bytes

or:

>>> gzip.GzipFile(fileobj=io.BytesIO(first_message_bytes)).readlines()
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "/usr/local/lib/python3.7/gzip.py", line 374, in readline
   
return self._buffer.readline(size)
 
File "/usr/local/lib/python3.7/_compression.py", line 68, in readinto
    data
= self.read(len(byte_view))
 
File "/usr/local/lib/python3.7/gzip.py", line 463, in read
   
if not self._read_gzip_header():
 
File "/usr/local/lib/python3.7/gzip.py", line 411, in _read_gzip_header
   
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x1f\xef')

Has anyone successfully decompressed a Darwin Push Port message in Python 3? Any help would be greatly appreciated!

Cheers,
Konrad

------------------------------------------

Hack Partners Limited is a company registered in England and Wales. Registered Address: Hack Partners, WeWork Old St, 41 Corsham Street, London, N1 6DR. Registered Number: 9274301.

 

This e-mail, and any files transmitted with it, are confidential and intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this e-mail in error, please notify the sender by return and then destroy it.

 

The views and opinions expressed in this email are the author’s own and may not reflect the views and opinions of Hack Partners Limited.

 

This e-mail message has been scanned by antivirus software, however Hack Partners Limited cannot guarantee that it is virus free.

Peter Hicks

unread,
Mar 19, 2019, 5:08:12 PM3/19/19
to Konrad Komorowski, A gathering place for the Open Rail Data community
Hi Konrad

Try adding auto_decode=False to your stomp.Connection object (so it doesn't try to decode the data received), then:

   zlib.decompress(message, zlib.MAX_WBITS|32))

It appears to work here with Python 3.6.7.

If I get some time tomorrow, I'll upload the demonstrator Stomp client that I've written.


Peter


 

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


OpenTrainTimes Ltd. registered in England and Wales, company no. 09504022.
Registered office: 13a Davenant Road, Upper Holloway, London N19 3NW

Thomas Wood

unread,
Mar 19, 2019, 7:13:29 PM3/19/19
to Konrad Komorowski, A gathering place for the Open Rail Data community
The following assumes you're using Python 3.

On Tue, 19 Mar 2019 at 21:05, Konrad Komorowski <kon...@hackpartners.com> wrote:
Encoded into bytes, assuming `utf-8` encoding:

>>> first_message_bytes = bytes(first_message, 'utf-8')

It is here that you are likely to be going wrong, the initial message is unlikely to be utf-8 encoded, given the number of non-printable bytes that are shown. The code that has produced the first_message value should be configured to return a bytes values, rather than the str type, which is already utf-8 (by definition).

If it is possible to configure the encoding of the library you are using, you may find the 'zlib_codec' encoding useful, as it will (should?) un-gzip for you automatically: https://docs.python.org/3/library/codecs.html#binary-transforms

Konrad Komorowski

unread,
Mar 19, 2019, 11:10:22 PM3/19/19
to A gathering place for the Open Rail Data community
Thank you so much Peter and Thomas!

Configuring stomp.py with auto_decode=False did the trick!

Peter, you're right in assuming that I was using stomp.py! Should have added that, as the error happened upstream of the code snippets I posted!

Thomas, I can't find an option to set the encoding used by stomp.py (here are the docs: https://jasonrbriggs.github.io/stomp.py/api.html#establishing-a-connection ) , but processing the raw bytes output is just one line of code in my listener, and is very explicit (and makes way more sense than the 'utf-8' encoding guess I made earlier)!

Thanks both for your quick help!

Best wishes,
Konrad

Peter Hicks

unread,
Mar 19, 2019, 11:24:32 PM3/19/19
to Konrad Komorowski, A gathering place for the Open Rail Data community
Hi Konrad

I'm not sure there is actually another Stomp client available for Python that's still being maintained.

Can anyone find any alternatives to Stomp.py?


Petet



--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Konrad Komorowski

unread,
Mar 20, 2019, 9:04:39 AM3/20/19
to A gathering place for the Open Rail Data community
No alternatives I have found. :) To be fair, wasn't looking for them anyway, as stomp.py satisfies our needs right now.

Will update you if this changes!

Best,
Konrad
Reply all
Reply to author
Forward
0 new messages