Best practice: data modelling in ArangoDB (ideas for cookbook recipes)

360 views
Skip to first unread message

CoDEmanX

unread,
Dec 5, 2014, 11:39:03 AM12/5/14
to aran...@googlegroups.com
There have been couple questions and discussions about data modelling and retrieval, e.g.:
I'd like collect such real-world challenges and the proposed solutions in this thread, so they can be generalized and added to the cookbook.

Here are a few challenges I face:

  • Tagging with keywords
    I want to migrate from an RDBMS and replace a couple columns with a tagging-system.
    A requirement for this to work well for the user is to provide auto-completion, which in turns requires a list of all tags / keywords.

    As newly added tags should show up to other users immediately, caching isn't much of an option. Aggregating all keywords from all documents isn't that great either for >100k documents, especially because there are mostly reads and rare keyword additions.

    My idea: denormalize the data and store the tags in the documents and have a second collection for keywords. The keyword itself would be the key, and its value the number of occurrences. On application level, all entered keywords for a document can be used to either create a new keyword document or to increment the counter in the existing document. On deletion data records, keyword counters need to be decreased or keywords maybe even removed automatically once their counter drops to zero. (These actions need to be transactional I guess.)

  • (Mono-hierarchical) grouping of documents
    All objects in my current database are part of either a project or sub-project. Groups are retrieved via SELECT DISTINCT or GROUP BY based on the (sub-)project title. I would like to allow my co-workers to enter project-related information in future.

    In ArangoDB, I would create a collection for the objects, and a collection for the project and link them with graph edges. In case of Project <-> Object relationships, the edge wouldn't store anything but _to and _from. For subprojects, it might be feasible to create an edge between the object and the parent project, and store the subproject title as edge attribute. The only downside that appears to me is that the application on top will have to query the edge collection to find objects of a certain subproject, and the projects vertex collection to find objects of a certain project.

Random ideas I would also like to hear your thoughts about:

  • Site activity indication
    Display how long the latest activity in the web application is ago to your co-workers. The collection should probably be isVolatile: true and maybe use a cap constraint (although there are no time-based constraints as of now to auto-remove documents older than x). Not sure what to use as key, maybe hash of page? If so, one could quickly look up activities for a certain page, but makes it harder to generate an overview of all activities in the system - which may actually be a plus in terms of legitimacy.

  • External jobs (like image processing) with Foxx Queues?
    Image processing such as thumbnail generation should be organized via a job queue, and may even be related to ArangoDB (e.g. file names + meta data need to be added to documents once they are ready). Is it feasible to use Foxx Queues for such async, potentially long running jobs? Is it crash-safe (are the queues persisted)? 

Thomas Schmidts

unread,
Dec 15, 2014, 10:58:09 AM12/15/14
to aran...@googlegroups.com
Thank you for the collection of recipes. Sadly I will not have the possibility to implement them into our cookbook in the next two weeks. After that I will work on them and add them to our cookbook.

CoDEmanX

unread,
Jan 16, 2015, 9:22:39 AM1/16/15
to aran...@googlegroups.com
Another interesting topic: Calendar implementation, especially with recurring events in mind.

It should be possible to either edit all recurrences of an event at once, or optionally change the future recurrences only (probably by turning the selected recurrence into a separate event).

Would you create nodes for every recurrence and link them together in a graph? Or calculate recurrences on the fly as certain date ranges are accessed?


Thomas Schmidts

unread,
Feb 2, 2015, 3:55:30 AM2/2/15
to aran...@googlegroups.com
Sadly at the moment I do not have the time to go through the discussions you posted and make some recipes. 

But the great thing about the arangodb cookbook is that you can write your own recipes and we will publish them in your name!

Just go to https://github.com/arangodb/Cookbook, write a recipe and make a pull request. This way you can help us make the cookbook better and fill it with recipes.  
Reply all
Reply to author
Forward
0 new messages