mFAST performance

Chris Aseltine

unread,

Dec 20, 2013, 12:43:01 PM12/20/13

to quickfa...@googlegroups.com

I recently downloaded mFAST from GitHub to test the performance after seeing the claim in the announcement message that it was close to an order of magnitude faster on average.

As my test case, I have a particular FIX/FAST message that I am interested in decoding quickly. It is 475 bytes in length and consists of about nine fields upfront and then 18 MDEntries sequence entries.

In QuickFAST this packet takes about 20 micros to decode in a tight loop; mFAST does it in about 13 micros. Obviously faster, but not that much faster. This is on a i7-3770k @ 4.4ghz using VS 2010 32-bit.

Has anyone seen much greater performance than this?

Oleg

unread,

Dec 21, 2013, 5:28:29 AM12/21/13

to quickfa...@googlegroups.com

I'm using MFast on http://moex.com/

50-60 bytes message:

QuickFast 5-7 microsecond per message

MFast 1.5-2 microsecond per message

100-150 bytes message

QuickFast 12-15 microseconds per message

MFast 4-5 microseconds per message

So in my case MFast is several times faster!

2 * Xeon E5-2640

Chris Aseltine

unread,

Dec 21, 2013, 1:29:09 PM12/21/13

to quickfa...@googlegroups.com

Hi Oleg,

Interesting; thanks for passing that anecdote along, I appreciate it.

-Chris

Oleg

unread,

Dec 22, 2013, 6:03:09 AM12/22/13

to quickfa...@googlegroups.com

Hi Chris

What do you mean by "anecdote"?

Do you think my results are not realistic? Do you think quickfast must be faster or slower or MFast must be faster or slower?

I just post what I've measured there are no reason for me to post fake results :) Of course it's always possible that somewhere I've a bug or measured something in a wrong way. But I know another guy who has same results about MFast on the same feed so it's pretty likely that my numbers are real.

Thanks,

Oleg

Alex B

unread,

Dec 22, 2013, 8:45:55 AM12/22/13

to quickfa...@googlegroups.com

Guys, a question: is mFAST a generic decoder or it is developed for a certain set of FAST messages ?

--
You received this message because you are subscribed to the Google Groups "quickfast_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quickfast_use...@googlegroups.com.
To post to this group, send email to quickfa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quickfast_users/ae6a342f-4f4d-41a8-978b-03124a8156d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Aseltine

unread,

Dec 22, 2013, 11:01:11 AM12/22/13

to quickfa...@googlegroups.com

Hi Oleg,

I don't "mean" anything by it :) Perhaps it is a "lost in translation" issue or that "anecdote" carries a loaded meaning in Russian; I was simply thanking you for sharing your singular data point, with no hidden or subtle implication.

Like I said, I personally saw a speed increase also, but not of the order of magnitude that has been seen. Maybe my particular test case (a larger message with many repeated MDEntries) cannot take as much advantage of whatever speed improvements have been delivered in mFAST.

-Chris

--

You received this message because you are subscribed to the Google Groups "quickfast_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quickfast_use...@googlegroups.com.
To post to this group, send email to quickfa...@googlegroups.com.

Chris Aseltine

unread,

Dec 22, 2013, 11:02:23 AM12/22/13

to quickfa...@googlegroups.com

As far as I can see, it is a generic decoder. You feed the templates file to the decoder program, which emits a series of C++ classes which encapsulate each template ID type, and then decode into those structures.

Do you see it another way?

Alex B

unread,

Dec 22, 2013, 1:10:57 PM12/22/13

to quickfa...@googlegroups.com

I asked because I have seen enough spots in quickfast C++ core implementation that can be optimized (especially if you think about what stl does behind the scene). And I believe proper optimization/redesign can yield noticeable performance improvement. So it is quite possible that someone (mFAST ?) came up with a good thought-through implementation which turned out to be quite superior to quickfast.

Alex

--
You received this message because you are subscribed to the Google Groups "quickfast_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quickfast_use...@googlegroups.com.
To post to this group, send email to quickfa...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/quickfast_users/CAJj1jMj5YAb6PUWq0cv5r5Uu-q8DfsVsPt4dZrW2zBjR_Yo%2BvQ%40mail.gmail.com.

Huang-Ming Huang

unread,

Dec 23, 2013, 2:35:23 PM12/23/13

to quickfa...@googlegroups.com

Hi Chris,

From your statement, it seems to me that you were decoding the same particular message in a loop. If that was the case, I can say that the performance you got from QuickFAST was because the data caching effect. FAST is a stateful protocol that it needs to remember the data in previous message to properly encode/decode the next message. If you decoded the same message over and over again, the decoder would just reused the same data in the memory and there would be no more dynamic memory allocations since the 2nd iteration. Therefore the QuickFAST performance you got was significantly boosted by that effect. However, I don't believe that is a typical usage scenario.

The test program and data set I mentioned in the mFAST article was all in the mFAST source tree. You can try to conduct the exact same experiments I did in your environment. I have tried some other templates and data set and got similar results presented in my article. Therefore I stand by my statement that mFAST is far faster than QuickFAST.

If my guess is wrong. Please post your test program and data so I can analyze what's happening in your case.

Chris Aseltine

unread,

Dec 23, 2013, 3:29:56 PM12/23/13

to quickfa...@googlegroups.com

Hi Huang-Ming,

No, you are correct in that I was decoding a singular message (but not in a loop, more like five to ten times just to see the 'cold cache' versus 'hot cache' performance). However, I don't think QuickFAST allocates memory at runtime if you use ValueMessageBuilder, even the first time you use it (unless you are hitting a string that is > 32 characters or etc.) And like I said, I did see mFAST as faster, but more like 100% faster (or "half the decode time").

If you want to check out my test case, here is the data and template that I used. I decoded it with code like:

::mfast::fast_decoder fd;

fd.include(templates_description);

uint64_t rdtsc1 = __rdtsc();

fd.decode(beg, end);

uint64_t rdtsc2 = __rdtsc();

Template:

<template name="template142" id="142" dictionary="142" xmlns="http://www.fixprotocol.org/ns/fast/td/1.1">
<string name="ApplVerID" id="1128">
<constant value="6" />
</string>
<string name="MessageType" id="35">
<constant value="X" />
</string>
<string name="SenderCompID" id="49">
<constant value="PCT" />
</string>
<uInt32 name="MsgSeqNum" id="34"></uInt32>
<uInt64 name="SendingTime" id="52"></uInt64>
<string name="PosDupFlag" id="43" presence="optional">
</string>
<uInt32 name="TradeDate" id="75"></uInt32>
<sequence name="MDEntries">
<length name="NoMDEntries" id="268"></length>
<uInt32 name="MDUpdateAction" id="279">
</uInt32>
<string name="MDEntryType" id="269">
</string>
<uInt32 name="SecurityIDSource" id="22">
<constant value="8" />
</uInt32>
<uInt32 name="SecurityID" id="48">
</uInt32>
<uInt32 name="RptSeq" id="83">
</uInt32>
<decimal name="MDEntryPx" id="270">
   <exponent></exponent>
   <mantissa></mantissa>
</decimal>
<uInt32 name="MDEntryTime" id="273">
</uInt32>
<int32 name="MDEntrySize" id="271" presence="optional">
</int32>
<decimal name="NetChgPrevDay" id="451" presence="optional">
   <exponent></exponent>
   <mantissa></mantissa>
</decimal>
<uInt32 name="TradeVolume" id="1020" presence="optional">
</uInt32>
<string name="TradeCondition" id="277" presence="optional">
</string>
<string name="TickDirection" id="274" presence="optional">
</string>
<uInt32 name="AggressorSide" id="5797" presence="optional">
</uInt32>
<string name="MatchEventIndicator" id="5799" presence="optional">
</string>
<uInt32 name="TradeID" id="1003" presence="optional" />
<uInt32 name="NumberOfOrders" id="346" presence="optional" />
</sequence>
</template>

Data:

   unsigned char packet[] =
   {
      0xc0, 0x01, 0x8e, 0x4c,
      0x39, 0xe7, 0x23, 0x61, 0x27, 0x4b, 0x43, 0x5d, 0x0b,
      0xfb, 0x80, 0x09, 0x4c, 0x5b, 0x93, 0x92, 0x80, 0xb2,
      0x28, 0x53, 0xa6, 0x02, 0xa7, 0x80, 0x00, 0x42, 0xb4,
      0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xb4, 0x80,
      0x80, 0x83, 0xb1, 0xb4, 0x80, 0x80, 0xb2, 0x28, 0x53,
      0xa6, 0x02, 0xa8, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d,
      0x08, 0xa0, 0x82, 0x81, 0xf5, 0xb5, 0x80, 0x80, 0x83,
      0x80, 0xb5, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02,
      0xa9, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0,
      0x82, 0x81, 0xf5, 0xb6, 0x80, 0x80, 0x83, 0x80, 0xb6,
      0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xaa, 0x80,
      0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81,
      0xf5, 0xb7, 0x80, 0x80, 0x83, 0x80, 0xb7, 0x80, 0x80,
      0xb2, 0x28, 0x53, 0xa6, 0x02, 0xab, 0x80, 0x00, 0x42,
      0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xb8,
      0x80, 0x80, 0x83, 0x80, 0xb8, 0x80, 0x80, 0xb2, 0x28,
      0x53, 0xa6, 0x02, 0xac, 0x80, 0x00, 0x42, 0xb4, 0x58,
      0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xb9, 0x80, 0x80,
      0x83, 0x80, 0xb9, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6,
      0x02, 0xad, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08,
      0xa0, 0x82, 0x81, 0xf5, 0xba, 0x80, 0x80, 0x83, 0x80,
      0xba, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xae,
      0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82,
      0x81, 0xf5, 0xbb, 0x80, 0x80, 0x83, 0x80, 0xbb, 0x80,
      0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xaf, 0x80, 0x00,
      0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5,
      0xbc, 0x80, 0x80, 0x83, 0x80, 0xbc, 0x80, 0x80, 0xb2,
      0x28, 0x53, 0xa6, 0x02, 0xb0, 0x80, 0x00, 0x42, 0xb4,
      0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xbd, 0x80,
      0x80, 0x83, 0x80, 0xbd, 0x80, 0x80, 0xb2, 0x28, 0x53,
      0xa6, 0x02, 0xb1, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d,
      0x08, 0xa0, 0x82, 0x81, 0xf5, 0xbe, 0x80, 0x80, 0x83,
      0x80, 0xbe, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02,
      0xb2, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0,
      0x82, 0x81, 0xf5, 0xbf, 0x80, 0x80, 0x83, 0x80, 0xbf,
      0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xb3, 0x80,
      0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81,
      0xf5, 0xc0, 0x80, 0x80, 0x83, 0x80, 0xc0, 0x80, 0x80,
      0xb2, 0x28, 0x53, 0xa6, 0x02, 0xb4, 0x80, 0x00, 0x42,
      0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xc1,
      0x80, 0x80, 0x83, 0x80, 0xc1, 0x80, 0x80, 0xb2, 0x28,
      0x53, 0xa6, 0x02, 0xb5, 0x80, 0x00, 0x42, 0xb4, 0x58,
      0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5, 0xc2, 0x80, 0x80,
      0x83, 0x80, 0xc2, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6,
      0x02, 0xb6, 0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08,
      0xa0, 0x82, 0x81, 0xf5, 0xc3, 0x80, 0x80, 0x83, 0x80,
      0xc3, 0x80, 0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xb7,
      0x80, 0x00, 0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82,
      0x81, 0xf5, 0xc4, 0x80, 0x80, 0x83, 0x80, 0xc4, 0x80,
      0x80, 0xb2, 0x28, 0x53, 0xa6, 0x02, 0xb8, 0x80, 0x00,
      0x42, 0xb4, 0x58, 0x4d, 0x08, 0xa0, 0x82, 0x81, 0xf5,
      0xc5, 0x80, 0x80, 0x83, 0x80, 0xc5, 0x80
   };

-Chris

Huang-Ming Huang

unread,

Dec 24, 2013, 11:27:58 PM12/24/13

to quickfa...@googlegroups.com

Now I understand where the problem is. You are using the ValueMessageBuilder instead of GenericMessageBuilder in QuickFAST. ValueMessageBuilder is only a abstract interface which allows the framework to inform the application what field is currently decoded and what is the decoded value of the field. It does not assemble the entire message for you like mFAST or GenericValueBuilder does. I did compare in my article that mFAST was only 23% better than QuickFAST PerformanceTest (which use ValueMessageBuilder only) in this case.

For the case with only one template and the template has only one sequence, using ValueMessageBuilder may be good enough for you. However, if you have several templates and each template has different nested sequences or groups, ValueMessageBuilder would become harder to manage. Most of the time, it would requires you to manually store some information in the heap so you can keep track of the context you are in.

mFAST always assemble the message for you so you don't have to worry about manually storing the required context. Furthermore it even performs better than QuickFAST using ValueMessageBuilder, albeit marginally.

In summary, I claimed mFAST was several times faster than QuickFAST using GenericMessageBuilder, not ValueMessageBuilder. It's more fair to compare to GenericMessageBuilder because it's the case when both mFAST and QuickFAST actually assemble entire messages. My article did show that mFAST was only 23% better than QuickFAST using ValueMessageBuilder which was in line with your observation.

Message has been deleted

Oleg

unread,

Dec 25, 2013, 1:48:22 AM12/25/13

to quickfa...@googlegroups.com

Let me add to my post above:

Numbers I've posted in the beggining of the thread is for ValueMessageBuilder case.

So on my implementation on my feed using my measurenments MFast is at least 2 times faster than Quickfast-ValueMessageBuilder

But probably my quickfast-valuemessagebuilder implementation is not good enough or my measurenments is not good enough. I guess it's easy to measure something wrong when you need "microseconds" precision, especially under Windows I'm using.

Also I was not able to use MFast to decode "Instruments Definition" feed. This feed is very complex and I'm still using Quickfast to decode it. I'm using MFast to decode "Order Incremental" and "Orders Snapshot".

In general I like MFast because it has less dependencies, much easier to use, and definitely not slower than Quickfast.

Reply all

Reply to author

Forward