Hello,
First of all let me thank you for reading and hopefully helping me out! :)
My name is Ricardo Amendoeira, I'm a Electrical Engineering student from Portugal and for my EE Master's thesis I'll be trying to add a different way of sharding geospatial data on MongoDB, based on Voronoi diagrams.
First I have some more general questions:
1) From which commit/tag should I base my work on? Latest commit on the Master branch, some recent release, other?
2) What sort of things should I be careful with in order to increase the chances that my final work will be accepted? (Besides updating/creating tests and commenting my code)
3) I'm getting a bit familiar with the file structure and organization of the repo but is there some documentation for contributors with this information?
4) Any other general tips for contributing? This will probably be my first contribution to an open-source project.
For the more specific questions I should probably give some more detail about how the the idea is supposed to work: The db admin selects a virtual coordinate location for each sharding cluster. Geospatial data can then be inserted into the cluster which is "closest" to itself based on the virtual coordinate of the cluster.
This is a more efficient way of sharding geospatial data, since it allows queries to hit fewer servers when searching for data of a certain region and it's also more flexible than the current method used by MongoDB (Quad-Tree) in terms of how it allows the space to be divided among clusters. Source: The attached document, which is an investigation into different geo sharding methods and the reason for my thesis.
So, my more specific questions:
5) I read on some third-party sources that MongoDB supports sharding by geolocation but the documentation says otherwise. Is it supported?
6) My current idea on how to implement this is to use Shard Tags, I'll add support for 2dsphere sharding and tagging, so that the user can tag each cluster with a 2dsphere coordinate of his choice. Is there a problem with this approach or a better way to do it?
7) As far as I understand so far, only mongoS servers need to know about sharding keys, so my first step should be to drill down the sh.shardCollection() command and modify the relevant files to accept 2dsphere coordinates as sharding keys, correct? Are there other components that this would significantly affect and that I should look into?
8) After that's done I think my next steps are to:
a) create a new command like sh.addTagGeoRange() (new command because the Tag behavior will be different)
b) modify commands related to queries/inserts/updates/deletes to behave according to the distance of the data to the virtual locations of the sharding clusters.
--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev+unsubscribe@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-dev.
For more options, visit https://groups.google.com/d/optout.