FIX protocol encoder/decoder

647 views
Skip to first unread message

zoka

unread,
Dec 22, 2010, 11:45:40 AM12/22/10
to Aleph
Hi Zach.

I am trying to work out how use gloss to encode/decode FIX protocol
messages. Could you please have a quick look and give me some initial
pointers.

Regards Zoka

FIX protocol (http://en.wikipedia.org/wiki/
Financial_Information_eXchange) message consists of several fields
that have this form:

<tag>=<value><SOH>

where tag is integer in string (ASCII) form, value is string (either
numeric or not, depending on tag), and <SOH> is single byte equal to
0x01. Some field tags are common for all messages and may have special
meaning.

Hare is sample message (| stands for SOH).

8=FIX.4.2|9=65|35=A|49=SERVER|56=CLIENT|34=177|52=20090107-18:15:16|
98=0|108=30|10=062|

Header fields with tags 8 (BeginString), 9 (BodyLength), 35 (MsgType),
49 (SenderCompID), 56 (TargetCompID) are mandatory. Last field (tag
10) is a check sum (3 digits always, modulo 256) of all bytes up to
this field.

Field with tag 9 (BodyLength) is length in bytes from tag 35
(included) up to tag 10 (excluded). Above sample message should be
decoded into {8 "FIX.4.2", 9 "65", 35 "A", 49 "SERVER", ... , 10
"062"}

Zach Tellman

unread,
Dec 22, 2010, 2:34:12 PM12/22/10
to alep...@googlegroups.com
Are the initial order of the tags always 8, 9, ...?

Zach Tellman

unread,
Dec 22, 2010, 2:35:40 PM12/22/10
to alep...@googlegroups.com
Also, is the character set ASCII for both the tags and values?
Message has been deleted

zoka

unread,
Dec 22, 2010, 11:01:54 PM12/22/10
to Aleph
(slightly revised)

Message always starts with tags 8, 9 and 35 (the header). The body of
the message that follows will have various (tag, value) pairs, some
mandatory, some optional (depending on message type tag 35), but it
will eventually end with tag 10 (check sum).

Tag is always integer represented as ASCII string (string-
integer :ascii). The value field is just an ASCII string in most of
the cases. If field is to contain raw binary data it has to be
preceded with a field containing the length. For example field with
tag 93(SignatureLength) contains the length of the raw binary data
that must follow in field with tag 89(Signature). Note that trailing
SOH is still there.

93=153|89=<153 bytes of raw data>|

There is a finite number (around 20 or so) tag-len tag-value pairs
that have to be observed, so they can be easily looked up.

Now comes the hairy part:

The FIX message logical structure is flat (no fields with repeating
tags), unless it contains repeating groups. Repeating group starts
with a field with specific tag that contains number for groups that
follow. The fields that group can contain are determined by finite set
of tags, only the first one is mandatory. For example a group
definition map:

{1000 '(444, 666, 555), ...}
;tag for repeating count is 1000, repeating group first mandatory
field tag is 444
;followed by possibly optional fields with tags 666 and 555 in any
order.

will allow this message fragment to be valid

1000=3|444=a|555=b|666=c|444=d|555=e|444=f|666=g|123=not-in-group|

The decoded message fragment could look like this:

{..., 1000 [ {444 "a", 555 "b", 666 "c"}, ... ], 123 "not-in-
group", ...)

Here we have 3 repeating groups. Tag 444 is always first in group and
mandatory. Field with tag 123 that is not member of group indicates
group termination.

Groups can be nested as well.

Is this level of complexity suitable for gloss? Maybe it is better to
"hand code" the codec function that is passed to netty when pipeline
is created?

Regards
Zoka

On Dec 23, 6:35 am, Zach Tellman <ztell...@gmail.com> wrote:
> Also, is the character set ASCII for both the tags and values?
>
> On Wed, Dec 22, 2010 at 11:34 AM, Zach Tellman <ztell...@gmail.com> wrote:
> > Are the initial order of the tags always 8, 9, ...?
>

Zach Tellman

unread,
Dec 23, 2010, 3:27:59 PM12/23/10
to alep...@googlegroups.com
Here's an untested rough draft of the protocol, as you've described
it: https://gist.github.com/753472. The one missing piece is a
checksum frame, but that should be pretty straightforward to add in.

This is a pretty interesting protocol, in that it doesn't fit some
assumptions I made when writing Gloss. All in all, though, I think
it's still easier to use Gloss than to write a custom parser.

Again, the above code hasn't been tested, so there may be some issues
with it. If you have any questions about how it works (or should
work), please feel free to ask.

Zach

zoka

unread,
Dec 23, 2010, 7:43:40 PM12/23/10
to Aleph

Zach, thank you very much !

Regards Zoka

On Dec 24, 7:27 am, Zach Tellman <ztell...@gmail.com> wrote:
> Here's an untested rough draft of the protocol, as you've described
> it:https://gist.github.com/753472.  The one missing piece is a
> checksum frame, but that should be pretty straightforward to add in.
>
> This is a pretty interesting protocol, in that it doesn't fit some
> assumptions I made when writing Gloss.  All in all, though, I think
> it's still easier to use Gloss than to write a custom parser.
>
> Again, the above code hasn't been tested, so there may be some issues
> with it.  If you have any questions about how it works (or should
> work), please feel free to ask.
>
> Zach
>

zoka

unread,
Dec 24, 2010, 6:04:51 AM12/24/10
to Aleph
Hi Zach,

I tried to test the codec for a simple form of FIX message field
tuple, based on your sample code at https://gist.github.com/753472

gloss REPL transcript
======================
user=> (use 'gloss.core 'gloss.io)
nil
;
; codec for simple FIX message field
;
; <numeric-string>=<string><SOH>

(defcodec simple-tuple
(header (string-integer :ascii :delimiters ["="])
(fn [key]
(string :ascii :delimiters [0x01]))

(fn [[key value]]
key)))

#'user/simple-tuple

user=> (encode simple-tuple [8 "FIX.4.2"])
java.lang.Exception: Expected a CharSequence, but got [8 "FIX.4.2"]
class clojure.lang.PersistentVector (NO_SOURCE_FILE:0)
user=>

It seems that there is a problem with body->header function.
In above example encoded header is 8= and body is "FIX.4.2". The Wiki
documentation states that last header parameter is s function that
takes value of body and returns value of header. The implementation of
simple-tuple seems to assume that body includes the header. That is
logical because value of the header can not be inferred from value of
the body, but on the other hand it clashes with the header definition.

Zach Tellman

unread,
Dec 24, 2010, 11:51:21 AM12/24/10
to alep...@googlegroups.com
Sorry, the code I posted yesterday had a silly mistake lurking in it.
Your simple-tuple codec should look like this:

(defcodec simple-tuple
(header (string-integer :ascii :delimiters ["="])
(fn [key]

(compile-frame
[key (string :ascii :delimiters [0x01])]))

(fn [[key value]]
key)))

As you correctly realized, there was no way to get the key from the
decoded value. Since the bytes have already been consumed, we need to
bake it into the header->body codec. Doing compile-frame for every
frame shouldn't be too bad, but you can always memoize the function to
get around that.

I've fixed the mistake in the gist I posted yesterday. Obviously,
there may still be others.

Zach

Dimitry Gashinsky

unread,
Dec 24, 2010, 12:45:02 PM12/24/10
to alep...@googlegroups.com
I've been following this thread. Just wanted to say that this is very
interesting to me and I would try to use this code in a month. Right
now I am using quickfix/j which is based on Mina.

zoka

unread,
Dec 26, 2010, 8:50:40 AM12/26/10
to Aleph
Hi Zach,

I made hand coded codec for simple FIX protocol message field and then
compared its speed with the gloss one. See https://gist.github.com/755420

I got the following benchmark result:

user=> (tuple-bench 100000)
Doing encode/decode 100000 times
encode(gloss): "Elapsed time: 506.785 msecs"
encode-tuple: "Elapsed time: 100.594 msecs"
decode(gloss): "Elapsed time: 5191.819 msecs"
decode-tuple: "Elapsed time: 3.486 msecs"

Gloss encode is 5x slower while decode probably has some *warn-on-
reflection* issue since
more than 1000x slower does not make too much sense.

Regards
Zoka






On Dec 25, 4:45 am, Dimitry Gashinsky <i+al...@gashinsky.com> wrote:
> I've been following this thread. Just wanted to say that this is very
> interesting to me and I would try to use this code in a month. Right
> now I am using quickfix/j which is based on Mina.
>
> On Fri, Dec 24, 2010 at 11:51, Zach Tellman <ztell...@gmail.com> wrote:
> > Sorry, the code I posted yesterday had a silly mistake lurking in it.
> > Your simple-tuple codec should look like this:
>
> > (defcodec simple-tuple
> >  (header (string-integer :ascii :delimiters ["="])
> >  (fn [key]
> >     (compile-frame
> >       [key (string :ascii :delimiters [0x01])]))
>
> >  (fn [[key value]]
> >   key)))
>
> > As you correctly realized, there was no way to get the key from the
> > decoded value.  Since the bytes have already been consumed, we need to
> > bake it into the header->body codec.  Doing compile-frame for every
> > frame shouldn't be too bad, but you can always memoize the function to
> > get around that.
>
> > I've fixed the mistake in the gist I posted yesterday.  Obviously,
> > there may still be others.
>
> > Zach
>
> > On Fri, Dec 24, 2010 at 3:04 AM, zoka <ztomi...@gmail.com> wrote:
> >> Hi Zach,
>
> >> I tried to test the codec for a simple form of FIX message field
> >> tuple, based on your sample code athttps://gist.github.com/753472

Zach Tellman

unread,
Dec 26, 2010, 3:15:08 PM12/26/10
to alep...@googlegroups.com
Thanks for taking the time to create this benchmark, I've been putting
off looking into performance because I didn't want to create the hand
coded equivalent of a codec.

The reason that your decoding is so much faster is because the first
decode (outside the timed loop) is consuming all the bytes in the byte
buffer. All subsequent decodes are being passed an empty buffer, and
returning nil. To prevent this from happening, you need to duplicate
the buffer before decoding it. I've added in the necessary .duplicate
calls here: https://gist.github.com/755574.

With those changes and a few type-hints added in, your solution is
still 5x faster at encoding and 20x faster at decoding. If we make
the buffer that Gloss is decoding contiguous, the difference goes down
to 16x. Obviously this is still pretty slow, so I'm going to keep
looking at it.

Zach

zoka

unread,
Dec 26, 2010, 5:55:42 PM12/26/10
to Aleph
Yes, that was a silly mistake - a simple .rewind of existing buffer is
probably bit faster. I think that gloss as a generic tool is still
doing fine performance wise.


On Dec 27, 7:15 am, Zach Tellman <ztell...@gmail.com> wrote:
> Thanks for taking the time to create this benchmark, I've been putting
> off looking into performance because I didn't want to create the hand
> coded equivalent of a codec.
>
> The reason that your decoding is so much faster is because the first
> decode (outside the timed loop) is consuming all the bytes in the byte
> buffer.  All subsequent decodes are being passed an empty buffer, and
> returning nil.  To prevent this from happening, you need to duplicate
> the buffer before decoding it.  I've added in the necessary .duplicate
> calls here:https://gist.github.com/755574.
>
> With those changes and a few type-hints added in, your solution is
> still 5x faster at encoding and 20x faster at decoding.  If we make
> the buffer that Gloss is decoding contiguous, the difference goes down
> to 16x.  Obviously this is still pretty slow, so I'm going to keep
> looking at it.
>
> Zach
>
> On Sun, Dec 26, 2010 at 5:50 AM, zoka <ztomi...@gmail.com> wrote:
> > Hi Zach,
>
> > I made hand coded codec for simple FIX protocol message field and then
> > compared its speed with the gloss one. Seehttps://gist.github.com/755420

zoka

unread,
Dec 26, 2010, 11:14:27 PM12/26/10
to Aleph
Update in https://gist.github.com/755420
Decoding speed for ascii string -> Java string can be increased by
using StringBuilder

user=> (tuple-bench 100000)
Doing encode/decode 100000 times
encode(gloss): "Elapsed time: 484.869 msecs"
encode-tuple: "Elapsed time: 82.125 msecs"
decode(gloss): "Elapsed time: 5167.422 msecs"
decode-tuple: "Elapsed time: 28.815 msecs"

Zach Tellman

unread,
Dec 27, 2010, 3:25:05 PM12/27/10
to alep...@googlegroups.com
Did you sync with the latest commit that added in type-hints?

zoka

unread,
Dec 27, 2010, 7:37:29 PM12/27/10
to Aleph
Yes I did. The bechmark result was is taken after several runs, so JIT
gets warmed up.

My environment: (MAC OSX 10.5)
Java 1.6.0_22 Java HotSpot(TM) 64-Bit Server VM

On Dec 28, 7:25 am, Zach Tellman <ztell...@gmail.com> wrote:
> Did you sync with the latest commit that added in type-hints?
>
> On Sun, Dec 26, 2010 at 8:14 PM, zoka <ztomi...@gmail.com> wrote:
> > Update inhttps://gist.github.com/755420

Zach Tellman

unread,
Jan 1, 2011, 11:34:43 PM1/1/11
to alep...@googlegroups.com
I've committed changes to the 'performance' branch on github that
doubles decode speed when (contiguous ...) is called on
'enc-bufs-gloss'. More improvements to come.

Zach

Zach Tellman

unread,
Jan 4, 2011, 1:58:35 PM1/4/11
to alep...@googlegroups.com
I've made some more improvements to the delimiter matching code, and
have merged the changes into master. All told, there should be about
an 8x improvement in performance with a contiguous input buffer.

Zach

Reply all
Reply to author
Forward
0 new messages