Should I use a different Collection for Geospatial data?

12 views

Skip to first unread message

Koko

unread,

Sep 19, 2016, 1:53:10 PM9/19/16

to mongodb-user

Hello,

my data consists (time series) sensor data and every ~10th record contains location data.

I need geospatial queries in mostly all of my queries. I have a read-only use case, updates might occur like once every six months for a dataset.

A query of importance is the following query pattern:

Search for location data that matches a bounding box
Query +10 and -10 minutes of the timestamp of each result from 1.

My questions:

Should I create a collection dedicated for the location data?

The schema of it differs a lot from the usual data. I know MongoDB is "schema-free", but still.

Will splitting up the collections help speeding up the 2dsphere index in some way? Or reading at all?

Whenever a record was found in 1., there will be many records following up that will also match that bounding box.

This means that I somehow need to distinct the values returned from 1. so I do not read the same data multiple times.

How can I achieve this?

Thanks!

Kevin Adistambha

unread,

Sep 28, 2016, 3:31:54 AM9/28/16

to mongodb-user

Hi,

My questions:
Should I create a collection dedicated for the location data?
The schema of it differs a lot from the usual data. I know MongoDB is “schema-free”, but still.
Will splitting up the collections help speeding up the 2dsphere index in some way? Or reading at all?

If I understand correctly, you have documents that looks like this:

{ location: {...}, data: {...}, timestamp: {...} }

and you are trying to separate them into two collections, e.g.:

{ location_id: <id>, location: {...} }

{ location: <location_id reference>, data: {...}, timestamp: {...} }

In terms of the two approaches, there are no “correct” way to implement this. Unlike relational database approach where you typically design your schema according to its storage model, MongoDB schema design emphasizes how you use your data instead.

Approach #1 is arguably simpler, since you have all the data you need in each document. When you need to query data from a certain location having a certain timestamp, a single find() command (backed by the necessary indexes) will suffice.

Approach #2 is arguably more space-efficient compared to approach #1, and is more aligned to traditional relational design. However, reading the required data could be more complex than approach #1.

Having said that, which approach is “best” and “fastest” is highly dependent on your use case, e.g. how much data you’re inserting per second, how is the balance of your read/write workload, how do you plan to use the data, etc. I suggest performing a thorough load-testing using your expected usage pattern to see which approach is more applicable/easier for you.

For examples, you can check out the MongoDB Use Cases page

Whenever a record was found in 1., there will be many records following up that will also match that bounding box.
This means that I somehow need to distinct the values returned from 1. so I do not read the same data multiple times.
How can I achieve this?

I’m not sure I understand what you said here. What do you mean by “many records following up that will also match that bounding box”. Could you provide an example of the workflow and some example documents that you have in mind?