Hi,
Coming back to the question of transferring data from C++ to kdb real time.
The process I have of doing this: firstly the feed handler receives the data, parses it and publishes to a port using zmq.
I listen to the data in my C++ application and for each message update from the port I create a K dictionary of the variables I want (approx 13 variables), then I send this dictionary across to kdb as follows:
K result = k(handle, "myfunc", dict, (K) 0);
where handle takes me to a kdb session on the localhost and specific port. The function myfunc takes the dictionary, dict extracts a couple of longs and converts them to datetimes and then inserts all the values to the end of a pre-defined table.
The problem I have is, this process is too slow to consume the data that is being published to the port, from the zmq. This means my application is missing lots of packets. Actually it's missing 70% of packets it is that slow. A run I did, earlier for 5 mins, the zmq published approx. 500,000 messages and my kdb application only recorded roughly 150000. which equates to only 500 messages per second on average. When I remove the above line from the C++ application, and print the count of messages received to screen, there is no loss of packets.
Therefore the bottleneck must occur in the line above, i.e. sending the dictionary of values to kdb from C++, across the handle.
I would like to know, is sending a (small) dictionary across to kdb slow? and therefore is this a pretty inefficient way of transferring the data realtime.
An alternative way of way of doing it then may be to append the dictionaries together in C++ (i.e. clumping the data) and then after every 5 seconds (say) send it across to kdb, therefore vastly reducing the number of times the I call the above line?
Another problem is, if kdb crashes, then when I reopen, I will wish to replay all the data into the table. So, actually the current method is, have another application receive the data from the port and write to a md file (binary) and have my application read and decode the md file, then sending to kdb as mentioned before. This means if kdb crashes, I can restart the application and all data is replayed from the md file and written to kdb. Problem is, reading and decoding the binary adds an extra level of slowness.
Maybe my method here is generally bad. Another solution I thought of, was to have a separate application write the 13 variables of each message update to a csv file and every so often have kdb read the csv file in using a read0 or something. But im not keen on something like this because it seems more restrictive, plus I don't know how a realtime verion of this would work. i.e. I can't keep reading in a csv every so often, as the csv obviously will grow very large by the end of the day.
The goal is,listen to the port, write the data using C++ to a file, then have c++ read this file constantly and send to kdb any updates. At end of the day, save the table to disk, and repeat for the next day. I don't care about a few seconds or more of latency doing this.
Someone I spoke to who's familiar with mongoDB said threading would help here, I guess multiple threads doing the transfer of data to kdb. Any thoughts?
I wouldn't have thought capturing 1000 messages per sec on avg and sending to kdb would be a problem?
best,
John.