Which database is most appropriate for sensor logging

Heiner Bunjes

unread,

Mar 24, 2011, 6:27:55 AM3/24/11

to NOSQL

Hello!

I need a database to log and retrieve sensor data.

The scale is as follows:

(01) Tenthousand sensors deliver 1 record per second each
-> Insert rate = 10kilorecord/s
-> Insert rate = 315 gigarecord per Year

(02) They have to be stored for ~3 years
-> Size of database = 1 terarecord

(03) Each record has about 200 bytes
-> Size of database = 200TB

(04) The system will start with just 700 sensors but will
soon increase upto the described volume

The main operations on the data are

(05) Append the new records to the database
(written records are never changed)

(06) Return all records for sensor X with a timestamp
between timestamp_a and timestamp_b

(07) Return the records specified in (06) ordered by
timestamp

(08) Delete all records older than Y

Further the following is true

(09) The database system MUST be free and open source

(10) The DB SHOULD be easy to administrate

Which is the most appropriate database for this?

p.s.:
I am currently havin a look at riak but I'm not sure how
to efficiently implement (06) and (07) or if there is a
much better choice than riak.

Thanks
Heiner

Viktor Sovietov

unread,

Mar 24, 2011, 4:58:01 PM3/24/11

to NOSQL

Hello!

IMHO, sober usage of plain log files will solve the problem. One thing
worth to consider is storing metadata in the database, just to
simplify files management and make queries faster.

Sincerely,

--Victor

Jim Peters

unread,

Mar 24, 2011, 7:31:43 PM3/24/11

to nosql-di...@googlegroups.com

If you're interested in a SQL approach, infobright is open source, and they claim to be focusing themselves on machine generated data ....

--
You received this message because you are subscribed to the Google Groups "NOSQL" group.
To post to this group, send email to nosql-di...@googlegroups.com.
To unsubscribe from this group, send email to nosql-discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nosql-discussion?hl=en.

--
Jim Peters
+1-415-608-0851 (Cell)
+1-416-466-9790 (Home)
+1-415-508-8651 (Google Voice)

seth/nqzero

unread,

Mar 24, 2011, 8:45:11 PM3/24/11

to nosql-di...@googlegroups.com

i think the general term for what you are looking for is a "time
series database". eventmonitor data accelerator is the one i'm
somewhat familiar with, but it's not open source, might be specific to
the financial data and it's web presence is pretty opaque. the
wikipedia entry doesn't list any implementations, but after a quick
search, rrdtool sounds like it might be what you're looking for. if
not, your write requirements are pretty modest. if your read
requirements are minimal as well, you could build an ad-hoc solution,
eg using flat files as viktor suggests

http://en.wikipedia.org/wiki/RRDtool
http://eventmonitor.com/prodAddOnEMDA.html

seth

@siculars

unread,

Mar 25, 2011, 4:25:51 AM3/25/11

to NOSQL

I am not ready to say that Riak would be the right solution here,
although I think it could work. You have some good things going for
you. A specific, well defined spec and immutable data. Using the
criteria you provided it would not be that much work to spec out a
solution. If I were to use Riak, as it exists today, for this project
I would do the following:

-Make extensive usage of key filters.
Key filters is a mechanism by which you can construct a filter to
select, from memory, a subset of keys to pass to map/reduce. This
means that if you use meaningful key names you get a big win
performance wise, and in reality would be one of the only ways to do
this. In your case the naming convention I would use would be
something like "sensorname_timestamp". You could construct a filter
that would look for only keys starting with "sensorname" and having a
"timestamp" between A and B where timestamp is a unix int. See
http://wiki.basho.com/Key-Filters.html .

-Map/Reduce considerations:
Maps are distributed in Riak. Meaning that after your key list has
been constructed via the key filter phase, the coordinating node (Riak
is homogeneous - any node can service any request) will hash the keys
in the set to their corresponding nodes in the cluster and distributed
the requests to all those nodes. The cluster will fetch the keys from
all those nodes and return them - out of order - to the coordinating
node. The reduce, in this case a sort, is applied on the coordinating
node.

Things to think about is the total key set returned via the key
filter. Given your exacting parameters we can calculate that each
sensor will have a maximum of 60*60*24 keys per day which if you are
doing daily rollups would push 86,400 keys to your coordinating node.
Once hashed (relatively inexpensive) the fetch would be fanned out
across the cluster and the cluster would return 86,400*250 bytes of
data back to the coordinating node for sorting (I padded your payload
a touch). Recall that the reduce (sort in this case) is not
distributed. This means that your coordinating node is sorting 21MB
data. If you are writing your m/r in javascript you will incur a
penalty for the marshaling between the erlang vm and the javascript
vm. If you do use javascript I would pay attention to the javascript
specific vm config items in the config file, namely the number of js
vm's and their memory allocations. But in reality I would get the m/r
written in erlang - not hard considering the simplicity of your needs.

-File system backend
The other thing to consider is the number of keys per node in your
cluster. Bitcask, the default file backend for Riak, stores a hint
file of all its keys in memory. This means the more keys you have the
more memory you eat. Fortunately, there is a metric to ascertain
maximum capacity like so:

cluster ram = ( key size + overhead ) * number of keys

In your case the sheer number of 1trillion keys will exhaust the
capacity for total ram in a cluster. In short, you can absolutely not
do this with the bitcask backend and the current convention of 1 key
per sensor per second. Your only other alternative would be the innodb
backend and I would have a serious conversation with the Basho guys
about that. The other alternative would be to rollup your 86,400 keys
per day per sensor into 1 key , thereby reducing your total key count
to simply 10 k per day. Each key would have 20 some odd MB of data but
that is totally manageable. Of course, your rollup frequency can be
shortened from a day to an hour. But you get my point, if you want to
use Bitcask (which you probably do) you will need to reduce your
overall key count. See https://spreadsheets.google.com/ccc?key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en#gid=0
.

I've written a blog post a bit back re. pagination in riak but it goes
over some of the m/r consideration and Riak limitations,
http://siculars.posterous.com/paginating-with-riak .

I would post this exact question to the Riak mailing list and see what
the Basho folks say. That said, I would also take a look here:
http://discoproject.org/ . I saw the main developer give a demo at
erlang factory just today. It is pretty impressive and may be a good
fit. Think hadoop but switch java with erlang.

Do write back with your findings.

Cheers,
-Alexander
@siculars

Heiner Bunjes

unread,

Mar 25, 2011, 11:32:56 AM3/25/11

to NOSQL

Many thanks to all of you who answered my question.
Especially I thank @siculars whose answer is extremely helpful.

From your answers I got some input to add some requirements to my
list.

######## <requirements version="2">

I need a database to log and retrieve sensor data.

Glossary

- Node = A computer on which on instance of the database
is running

- Blip = one data record send by a sensor

- Bin = A container holding a lot of blips. The database
could store bins instead of blips to reduce the total number
of keys.

- Blip page = The sorted list of all blips for a specific sensor
and a specific time range.

The scale is as follows:

(01) Tenthousand sensors deliver 1 blip per second each
-> Insert rate = 10kiloblip/s
-> Insert rate = 315 gigablip per Year

(02) They have to be stored for ~3 years

-> Size of database = 1 terablip

(03) Each blip has about 200 bytes

-> Size of database = 200TB

(04) The system will start with just 700 sensors but will

soon increase upto the described volume.

The main operations on the data are:

(05) Append the new blips to the database
(written blips are never changed)!

(06) Return all blips for sensor X with a timestamp
between timestamp_a and timestamp_b!
With other words: Return a blip page.

(07) Return all the blips specified in (06) ordered
by timestamp!

(08) Delete all blips older than Y!

Further the following is true:

(09) The database system MUST be free and open source.

(10) The DB SHOULD be easy to administrate.

(11) 99.9% of the blips are inserted in
chronological order, the rest is not.

(12) All data MUST still be writable and readable while less
then the configurable number N of nodes are down (unexpectedly).

(13) The mechanisms to distibute the data to the available
nodes SHOULD be handled by the database.
This means that the database SHOULD automatically
redistribute the data when nodes are added or removed.

######## </requirements>

The application I am building is a kind of complex event processing
core.
It is mainly written in erlang.

#### Plain logfiles
are an interesting approach but IMO are very
much in conflict with (10, 11, 12, 13).
It would mean to do most of the work by myself.

#### infobright
seems not to be able to fullfill replication (12, 13) with
its free variant.

#### eventmonitor data accelerator
is not free (09).

#### rrdtools
doesn't seem to be distributed. So again I would have to do
very much by myself.

#### riak
Hmm.
As it looks I have (at least) the problem that
at 1 key/blip I get to many keys for Bitcask (InnoDB might work).
So I could build records which hold multiple blips; I will call them
bins.
One bin might hold 86,400 blips (e.g.) but only has one single key.
This will cause some work for packing and unpacking.
And it would in some cases increase the amount of data that is send
to the coordinating node because as I understand it, a whole bin would
be
sent even if only one blip is needed. Right? Or is there a mechanism
to
already filter the content of the bin before sending it to the
coordinating
node?

I'm still reading the pagination and discoproject stuff.
Give me some time before answering that!

At the moment I am really wondering if there is no off the shelf
solution for my needs. The problem I have seems so natural to me.
I just have blips to be logged away and I have users who later on
want to have a look at specific blip pages. It's just the amount of
data that is unusual.

Sylvain Lebresne

unread,

Mar 25, 2011, 4:55:41 AM3/25/11

to nosql-di...@googlegroups.com, Heiner Bunjes

There is probably more than one database that would fit the bill, but
I would encourage you to
at least have a look at Cassandra.

It handles times series very well and in particular (05), (06) and
(07) will be easy to do and efficient
(Cassandra will store your records in sorted order on disk, so reading
a range is just a seek to where
the range start and then sequential read).

For (08), there could be a number of solutions depending on the exact
data layout you use, but I'll mention
here that Cassandra supports TTL at the record level, so if you know
upfront that you need to keep a record
3 years, that will be super easy (and cheap). If that don't work for
you, writing a map-reduce job to do this
should be straightforward (using say Hive to make that simpler).

It's free and open source and I think fairly easy to administrate. The
scale won't be a problem since that's
why Cassandra is built for.

--
Sylvain

ThreeioKevin

unread,

Mar 25, 2011, 3:12:43 AM3/25/11

to NOSQL

Check out OpenTSDB.net

Just saw a decent presentation on their work, seemed interesting.
hBase and java.

-kev

On Mar 24, 4:45 pm, "seth/nqzero" <b...@nqzero.com> wrote:
> i think the general term for what you are looking for is a "time
> series database". eventmonitor data accelerator is the one i'm
> somewhat familiar with, but it's not open source, might be specific to
> the financial data and it's web presence is pretty opaque. the
> wikipedia entry doesn't list any implementations, but after a quick
> search, rrdtool sounds like it might be what you're looking for. if
> not, your write requirements are pretty modest. if your read
> requirements are minimal as well, you could build an ad-hoc solution,
> eg using flat files as viktor suggests
>

> http://en.wikipedia.org/wiki/RRDtoolhttp://eventmonitor.com/prodAddOnEMDA.html

K.S. Bhaskar

unread,

Mar 25, 2011, 1:41:19 PM3/25/11

to NOSQL

How does the application code read the sensors? For example, is there
a process for each sensor that hangs for a second and reads an IP
address? Are there multiple processes each of which polls a group of
sensors? Is there a device driver that reads the sensor data and
asynchronously e.g., via DMA, and puts the readings in a shared memory
segment?

What are the requirements for availability? Recoverability in the
event of hardware failure?

Sounds like a nice little afternoon project with GT.M (http://fis-
gtm.com). Full disclosure: I manage GT.M.

Regards
-- Bhaskar (yes, it's my last name, but that's what I'm called)
ks dot bhaskar at fisglobal dot com <-- send e-mail here

--
GT.M - Rock solid. Lightning fast. Secure. No compromises.

Tracy Reed

unread,

Mar 25, 2011, 6:02:27 PM3/25/11

to nosql-di...@googlegroups.com

On Thu, Mar 24, 2011 at 08:45:11PM -0400, seth/nqzero spake thusly:

> i think the general term for what you are looking for is a "time
> series database". eventmonitor data accelerator is the one i'm

Indeed. Also check out this one which I just recently learned about:

http://graphite.wikidot.com/start

It is supposed to be a more scalable and higher performing time series database
than rrd. It was developed by Orbitz.com for their serious data acquisition and
graphing needs. I haven't used it yet but look forward to playing with it some
day.

--
Tracy Reed Digital signature attached for your safety.
Copilotco Professionally Managed PCI Compliant Secure Hosting
866-MY-COPILOT x101 http://copilotco.com

Reply all

Reply to author

Forward