I'm trying to decided on a database technology and schema design for keeping my own copy of an order book for GDAX, a cryptocurrency trading exchange.
The eventual goal is to use the order book data as data for some experimentation with machine learning.
GDAX provides a websocket stream of the order book data that provides an initial snapshot of the state of the order book. You subscribe to any currency pairs that you are interested in. In this case it's bitcoin/USD. The websocket provides you with a JSON object for each pair that you request. There are key/value pairs for each rate that the product pair is trading at and the quantity that's available at that rate.
{"product_id":"BTC-USD","type":"snapshot","bids":[["10853.3","0.00672786"],["10850.99","0.001"]]} I've stripped out most of the pairs for brevity. There's a second array in the snapshot for 'asks' meaning sale offers.
After that, it provides update objects with key value pairs for any updated bids or asks. They look like this:
{"type":"l2update","product_id":"BTC-USD","time":"2018-01-24T03:47:54.819Z","changes":[["sell","10818.58000000","0.4991"]]} In this example, the quantity available at 10818.58 has changed (it's not a delta, it's just the new value.)
Although, logically, I see the advantage in using a database that represents data in the same format that I want to interact with it in my applications (some sort of object), my practical experience is only with traditional SQL databases.
It seems like I should store the whole order book from the snapshot as a single object with nested objects for bids/asks(offers/sells) that each contain sub objects for each rate and then update it with the type"l2update" objects, but I gather that updating records in nested objects can get complicated.
I'd like to use a nosql database for the practice, but I don't want to waste a ton of time discovering that the technology is going to make this project more difficult.
Any thoughts on schema design or using mongodb for this data are greatly appreciated.
Tom