Hi Dave
Welcome! kdb+ is a good place to be! (you should also sign up to the personal kdb+ developers group if you haven't already)
What you are trying to achieve sounds like a pretty common use case, and can be done with TorQ. Best place to start is to download the TorQ base package along with the TorQ Finance Starter Pack. The starter pack is a fairly solid initial stab at a market data capture system. You should be able to then modify the schema, and switch in your feed (if you need help converting the output of your feed to something kdb+ understands then give us a shout). We put together a short video on setting up TorQ :
http://www.aquaq.co.uk/q/torq-kdb-data-capture-in-two-minutes/. (we know it's a bit cheesy :-) )
In terms of minimising disk and memory usage there are a few options.
kdb+ will use memory to store data and also to process queries (intermediate result sets). Queries can sometimes be restructured to reduce memory, usually with some cost (e.g. execution time or code complexity). In a standard data capture system there is usually a RDB (real time database) and HDB (historic database). The RDB is usually the current day's data in memory, the HDB is usually the everything prior to today on disk (you can change all this though). The main user of memory is usually the RDB. TorQ allows you to easily specify (in a config file) which tables (and which instruments for those tables if required) are stored in it. Because the RDB in TorQ isn't responsible for writing data to disk at end-of-day, data that is required historically for analysis purposes but is not required intraday can still be captured intraday using the same set up (some examples we have seen of this is high volume order data, which quants analyse historically but isn't used intraday).
TorQ also has some features for minimising memory usage and tickerplant back ups around intraday. A standard data capture has the following processes:
Tickerplant : captures data, writes it to a log, publishes to consumers (the same as tick.q from kdb+ tick)
RDB : stores data required intraday (similar to r.q in kdb+ tick, but a lot of extensions)
WDB : periodically writes the intraday data to disk (similar to w.q, but a lot of extensions)
Sort : a separate, optional process which is used to sort or merge the data after end of day. It is invoked by the WDB. We use a separate process to avoid tickerplant back ups (increased memory usage) in 24 hour markets such as FX
Sort Slave: a separate, optional process which can be used to parallelise the end of day sort/merge process.
You can tune the RDB and WDB to minimise memory. We've written some blogs on it:
To minimise disk:
1. make sure you use compression. The standard TorQ set up compresses data each day, it also has a separate process that can be used to run through databases and compress all within, which is driven by a config file (
https://github.com/AquaQAnalytics/TorQ/blob/master/config/compressionconfig.csv). Can be used to specify the compression settings down to the column level (each can be done differently). You can also specify e.g. "only compress data after it is X days old".
2. Periodically remove or down sample data (some data sets only have value for a certain period, and after that period an aggregate or down sampled data set will suffice)
3. store less data / make sure you schema is well normalised
Latency V Throughput: there are a few things that can be tuned. The TP can be run in a batch mode (timer) which will increase both latency and throughput. WDB processes can be replicated (e.g. different WDBs for different subsets of data) and data merged to the same HDB at end-of-day. Don't send all the table updates to the RDB if they aren't required. At some point, when the volumes get bigger, you can scale horizontally (i.e. multiple separate data capture setups capturing different sets of data)
Tickerplant roll time: We have done TorQ set ups for FX customers with a differing EOD time. It requires a modification to the tickerplant for its end-of-day check. It also requires the users of the data to be comfortable with the concept of "date" in the HDB, i.e. date becomes "trading date" and does not align with a GMT date or local time date (usually). We will see if we can incorporate these features into the TorQ tickerplant.
5/10 min moving average: yes, you can do that, you need to write a process to connect to the TP and subscribe to data, then calculate the spread values as the ticks arrive.
32bit V 64bit: TorQ runs on 32bit (if you happen to be lucky enough to have downloaded one of the commercial versions), though memory limitations need to be considered (again the RDB is the issue here... everything else can be run to store a lot of data in a small memory footprint). If you need to get a licence and move to 64 bit the upgrade will be seamless.
HTML5: kdb+ supports websockets so you can write HTML5 screens which talk directly to the database. The monitor process in TorQ is a (very) basic monitoring process which has an example HTML5 front end. We added some HTML utilities to TorQ including some stuff to allow you to do pub/sub for HTML clients -
https://github.com/AquaQAnalytics/TorQ/blob/master/code/common/html.q
Anything else you need a hand with, please shout!
Thanks
Jonny