TorQ

254 views
Skip to first unread message

Birzos

unread,
Jul 28, 2017, 8:07:56 AM7/28/17
to AquaQ kdb+/TorQ
Looking at your framework and have a few questions for pointers.

1. Where would you change the end of day process so that only 2hours of data is kept in the RDB for quote

2. With aggregation timeframe tables 1Minute,5Minute... would the optimal location be the RDB via a timer

3. Where would you update the HDB code to include the aggregation tables for the gateway

4. Is there a specific reason why the quote table is timeframe/instrument rather than the reverse

5. One of the data feeds sends duplicate data, however putting a key with upsert on the quote table fails


For 5) use "where not quote~'prev quote" but is that the most efficient approach.

Regards.

Jonny Press

unread,
Aug 1, 2017, 4:16:58 AM8/1/17
to Birzos, AquaQ kdb+/TorQ
Hi 

Welcome! 

1. Where would you change the end of day process so that only 2hours of data is kept in the RDB for quote

To do that you would put add a timer call to the RDB process to delete anything not within the last two hours, for each table. 


However, that only solves a small part of the problem.  If you need the data from > 2 hours ago and after the previous EOD to be available then you need to make some more in depth changes.  Probably best is to modify the WDB process to write down the data in a query-able format, where it could be accessed directly and separately, or where it copies it across to the HDB more regularly. 

Bottom line is - there’s a bit of work involved.  It’s not a simple config switch or anything like that. 

If the driver behind this is for memory issues, then the memory of the RDB can be reduced in other ways.  If you don’t require all of the data intraday (i.e. only on a T+1 basis) then the RDB shouldn’t subscribe to it, and it will still be persisted to the HDB by the WDB (this is sometimes the case, e.g. the system stores trade, quote and depth datasets.  trade and quote are required intraday, depth is the largest dataset but only required T+1 for analysis purposes).  Alternatively, perhaps the RDB can only subscribe to a subset of instruments from the TP. 

2. With aggregation timeframe tables 1Minute,5Minute... would the optimal location be the RDB via a timer

Usually we would do that in a separate process, a real time subscriber which builds and maintains an X minute bucketed view of the dataset.  There’s an example of a realtime engine here: 


3. Where would you update the HDB code to include the aggregation tables for the gateway

Not sure what you mean.  If you have setup TorQ with the Finance Starter Pack, then you can drop q or k files in code/hdb to be automatically loaded by the HDB.  You can also put a file in there called order.txt which will specify the order to load the files in. 

4. Is there a specific reason why the quote table is timeframe/instrument rather than the reverse

Tradition.  The kdb+ tick tickerplant (which the TorQ one is derived from) requires the first two columns of a table to be time and sym, specifically with those names.  time can be sent from the feed, or if not supplied then the tickerplant will stamp it on. 

| 5. One of the data feeds sends duplicate data, however putting a key with upsert on the quote table fails

I haven’t tested, but I don’t think the tickerplant can work with keyed tables.  Basically to do this you should modify the RDB to maintain the keyed table - i.e. apply a key to the schema after it has received it from the TP, and modify the upd function to do upsert instead of insert. 

Thanks 

Jonny 
 

--
www.aquaq.co.uk
www.aquaq.co.uk/blog
www.aquaq.co.uk/training
---
You received this message because you are subscribed to the Google Groups "AquaQ kdb+/TorQ" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kdbtorq+u...@googlegroups.com.
To post to this group, send email to kdb...@googlegroups.com.
Visit this group at https://groups.google.com/group/kdbtorq.
To view this discussion on the web, visit https://groups.google.com/d/msgid/kdbtorq/29dfa995-e89f-4687-9ac9-76f638932054%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Birzos

unread,
Aug 2, 2017, 3:31:46 PM8/2/17
to AquaQ kdb+/TorQ, bir...@outlook.com
Many thanks for the detailed reply, all the comments are noted however are trying to understand the functional process.

2. The principle is to have bucketed xbar tables in the RDB and have them auto written to the HDB, however is that more optimal than the metrics process and recreate the table on startup, we are looking to store the most recent 1,000 bars.

5. We receive an error in the tickerplant.q tick function when adding keys, is this a limitation of the tickerplant which would influence the choice in 2) above.
tick:{init[];if[not min(`time`sym~2#key flip value@)each t;'`timesym];@[;`sym;`g#]each t;d::.eodtime.d;if[l::count y;L::`$":",y,"/",x,10#".";l::ld d]};

6. As an additional point, the gw.syncexec, if you have a query such as "select by time:1 xbar time.date,sym from `quote where time within 2017.07.01T00:00:00.000 2017.08.02T23:59:59.000" what is the process to join RDB and HDB, do you need two separate queries and join the results or is there a way to provide a query which auto select from both RDB and HDB.

Regards.

Jonny AquaQ

unread,
Aug 3, 2017, 4:03:06 AM8/3/17
to AquaQ kdb+/TorQ, bir...@outlook.com
Hi 

(2) - I would have a separate process maintain them to remove load from the RDB.  Given you have a gateway you can still access them by the same entry point, just the gateway will route the request to the caching process. 

At end-of-day you could either rebuild the tables in whichever process you have writing the data to the HDB (either the RDB or WDB), or  you could do it from your caching process as well (personally I would use the first approach).  You just need to make sure the tables are fully up-to-date before writing i.e. all the ticks prior to .u.end have been processed. 

(5) Yes - not sure of what else will fail, but the "key flip value" bit will not produce the correct result on a keyed table.  The purpose of this line is to get the column names of the table, on a keyed table it will only get non-key columns of the table.  You could try changing this bit just to use cols instead, but I'm not sure of the other side effects. 

Either way, keying the table in the TP probably isn't actually want you want to do - you want to key the table in the consumers of the data.  So you can modify the RDB to maintain a keyed version of the table without changing the TP code. 

(6) You can write a query to access both directly from the gateway, but you have to bear in mind that the RDB and HDB have different requirements in terms of where clause order (RDB is usually "where sym ..., ...", HDB is usually "where date ..., sym ... , ..."

An ugly way to do it is to form a query something like (untested)

{[starttime; endtime] 
 $[.proc.proctype=`rdb; 
   select by time:1 xbar time.date,sym from quote where time within (starttime;endtime);
   select by time:1 xbar time.date,sym from quote where date within `date$(starttime; endtime), time within (starttime;endtime)]} 

A better approach is to create functions in both the rdb and hdb, which have the same signature (same params, same return value shape) but are optimised for the particular process, and then call the function e.g. 

.gw.syncexec["bucketquote[2017.07.01T00:00:00.000; 2017.08.02T23:59:59.000]";`hdb`rdb]

The function definition can be put in code/hdb and code/rdb respectively. 

Note also with .gw.syncexec:
 - the order of the process types requested is important - `hdb`rdb will return hdb results with rdb appended, which is probably what you want
 - if you want a custom join function you can use 

.gw.syncexecj

whose format is 

[query; processes; join function to use]

join function is a monadic function which will be supplied with the list of results retrieved, in the correct order.  The default in .gw.syncexec is just raze 

Thanks 

Jonny

Birzos

unread,
Aug 7, 2017, 5:53:23 PM8/7/17
to AquaQ kdb+/TorQ, bir...@outlook.com
Many thanks for the detailed reply, we are going through all the information which is extremely in understanding the processes. Regards.
Reply all
Reply to author
Forward
0 new messages