Speed of count(*) with traverse

84 views
Skip to first unread message

Simon White

unread,
Sep 30, 2015, 1:11:32 PM9/30/15
to OrientDB
OrientDB version : 2.1.2

I have set up a time series to store logging data from a number of my devices. I am seeing about a million log entries a day so the time series is set up to store by year, month, day, hour and minute (see forum entry 'Time Series Java API example and from docs http://orientdb.com/docs/2.1/Time-series-use-case.html)

If I want to count the total number of logs in a particular day, for a particular device I use...

select count(*) from (SELECT expand (log) FROM (TRAVERSE minute FROM (TRAVERSE hour FROM (select expand(month[9].day[29]) from year where label = '2015')))) where device = "VB7948"

this takes about 100 seconds, which I guess is not too bad.

However if I want to return the actual results i.e.

select from (SELECT expand (log) FROM (TRAVERSE minute FROM (TRAVERSE hour FROM (select expand(month[9].day[29]) from year where label = '2015')))) where device = "VB7948"

this too takes about 100 seconds, which seems very reasonable when we consider it is returning a million entries!

So my question - why is count not any faster?

When I execute a similar query on my old RDBMS (involving comparisions of timestamps from a single large table) OrientDB outperforms on the select statement (yay!) but the RDBMS can execute the count in half the time.

Am I doing something silly?



Luigi Dell'Aquila

unread,
Oct 1, 2015, 12:56:35 AM10/1/15
to orient-...@googlegroups.com
Hi Simon,

traversal operation requires vertex unmarshalling to find outgoing edges, this is probably the main point where your operation consumes resources. Traversal has to be fully executed for both returning results and counting, so the performance is almost the same.

I guess you can change your query a little bit to speed it up, try this:

or even


Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon White

unread,
Oct 1, 2015, 10:40:42 AM10/1/15
to OrientDB
Does this work? .hour.minute.log does not seem to expand, I get a count result of 0

Simon White

unread,
Oct 1, 2015, 12:18:10 PM10/1/15
to OrientDB
Would it help to add an index on the device name or is that ridiculous within a timeseries?

Luigi Dell'Aquila

unread,
Oct 2, 2015, 2:52:57 AM10/2/15
to orient-...@googlegroups.com
Hi Simon,

no, defining an index on the device name won't help.
Could you please share a sample db? I'd like to try your query

Thanks

Luigi


2015-10-01 18:18 GMT+02:00 Simon White <secret.sa...@googlemail.com>:
Would it help to add an index on the device name or is that ridiculous within a timeseries?

Simon White

unread,
Oct 2, 2015, 7:44:43 AM10/2/15
to OrientDB

Thank you so much for your offer of help, would I just attach a zip of the database here?

Simon White

unread,
Oct 2, 2015, 11:21:22 AM10/2/15
to OrientDB
I have cut the database down to just one day of logs, the logs are all stored against today (02/10/2015)

The database schema looks like...  years -> months -> days -> hours -> minutes -> logs

My use case is to first count and then (potentially) return all the logs for a given month for a given device. To do this I was hoping I could multithread a query like:

over each day of the month (via java API) and get a result fairly quickly. I am not expecting miracles (I realise there is a lot of data!) but to match my RDBMS I was thinking I might at least be able to get a count in under a minute....


Simon White

unread,
Oct 2, 2015, 11:23:40 AM10/2/15
to orient-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages