AQL / Foxx queries are long and messy - how to shorten?

321 views
Skip to first unread message

ropel...@gmail.com

unread,
May 13, 2016, 10:33:31 AM5/13/16
to ArangoDB
Apologies for the gigantic post and thanks in advance to anyone who has the patience to wade through my code!

This is basically a follow-up to a question on stack overflow. That question got a very helpful answer, but now I am trying to do a bit more with the query and my AQL has gotten very long and slow. I've also tried building a Foxx service (using the suggestion in the SO answer as a starting point), but I'm finding Foxx pretty confusing and awkward to use. (I don't understand how/where to view console.log statements; I can't edit the foxx data.js files from WebStorm because ArangoDB is installed as an admin; etc...)

Other than simply "how can I simplify these queries?", my two main questions are:
- When I return a document I want to return a lot of data that exists as the graph structure itself (e.g. lists of edges or neighbor nodes), but doing this requires subqueries that not only make the query slow, they also make the query itself long and unwieldy. (50+ lines in some cases) Is there a good way to modularize this code using custom functions or something else? Or am I simply asking unreasonable things of the database?

- I am trying to set up a system for paging based on sorting by the temporary graph data that is returned for each node, e.g. the query will return the 30 neighbors that each in turn have the most edges. The graph structure here makes this particularly tricky because I would like to be able to do queries of depth > 1. I think I have it working in the Foxx example below but not the AQL version. Is there possibly a better way to do this?

To recap, I'm trying to use ArangoDB as a backend for a d3 force-directed diagram display, and I need data in something resembling the following format:

var nodes = [
             
{
                vertexData: {...document properties...},
               
tempEdgeData: {...vertex properties constructed from graph...},
                inEdges
: [...array of vertex ids...],
               
outEdges: [...array of vertex ids...]
              },

             
{vertexData, tempEdgeData, inEdges, outEdges},
              {vertexData, tempEdgeData, inEdges, outEdges},
            ],

    links
= [
             
{source: nodes[0], target: nodes[1]},
             
{source: nodes[1], target: nodes[2]}
           
]

I am also returning a 'progress' object to facilitate paging, which I've gotten working in the Foxx query but not the AQL version.

The AQL query I've come up with at this point is something like this:
 
LET docId = 'DocumentA/380945'
LET limiter = 30
LET limitOffset = 0
LET edgesTmp
= GRAPH_EDGES('combinedGraph',docId,{direction:'any',maxDepth:1,includeData:true})
LET verticesTmp
= (
  FOR v IN GRAPH_NEIGHBORS
('combinedGraph', docId, {direction: 'any', maxDepth:1, includeData: true})
    LET evidence
= {evidence: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id})}
    LET arguments
= {supporting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 'inbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
      supported
: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 'outbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
      contradicting
: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 'inbound', edgeExamples: {type: 'contradicts'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
      contradicted
: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 'outbound', edgeExamples: {type: 'contradicts'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id})}
    LET tags
= {tags: (FOR n IN GRAPH_NEIGHBORS('tagLinkGraph', v, {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: n.name, code: n.code, _id: n._id})}
    RETURN
{
      vertexData
: v,
      tempEdgeData
: {tags, arguments, evidence},
      outEdges
: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'outbound', maxDepth: 1, includeData: false}),
      inEdges
: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'inbound', maxDepth: 1, includeData: false}),
    }
)
LET evidence = {evidence: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id})}
LET arguments
= {supporting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, {direction: 'inbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: '
DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
  supported
: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, {direction: 'outbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: '
DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
  contradicting
: (FOR n IN GRAPH_NEIGHBORS('type1Graph',
docId, {direction: 'inbound', edgeExamples: {type: 'contradicts'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id}),
  contradicted
: (FOR n IN GRAPH_NEIGHBORS('type1Graph',
docId, {direction: 'outbound', edgeExamples: {type: 'contradicts'}, vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: n._id})}
    LET tags
= {tags: (FOR n IN GRAPH_NEIGHBORS('tagLinkGraph', docId, {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: n.name, code: n.code, _id: n._id})}
    RETURN
{
      vertexData
: DOCUMENT(docId),
      tempEdgeData
: {tags, arguments, evidence},
      outEdges
: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'outbound', maxDepth: 1, includeData: false}),
      inEdges
: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'inbound', maxDepth: 1, includeData: false}),
    }
)
LET vertices = (FOR v IN verticesTmp2
  SORT v
.distFromOrigin ASC, LENGTH(v.inEdges) DESC
  LIMIT
0, 30
  RETURN v
)
LET progress = {
  'totalVertices': LENGTH(verticesTmp2),
  'currentVertex': limitOffset + limiter,
  'limit': limiter,
  'limitOffset': limitOffset,
  'currentQuantityReturned': MIN([limiter, LENGTH(vertices)])
}
LET edges
= (FOR edge IN edgesTmp FILTER POSITION(vertices[*].vertexData._id ,edge._to, false) && POSITION(vertices[*].vertexData._id ,edge._from, false)  RETURN edge)
RETURN { edges, vertices, progress }

Alternatively, the Foxx query I've gotten working is faster, but I don't understand Foxx very well and I'm having trouble converting this query into

var _ = require('underscore');
var joi = require('joi');
var Foxx = require('org/arangodb/foxx');
var ArangoError = require('org/arangodb').ArangoError;
var db = require('org/arangodb').db;
var console = require('console');

var Data = require('../repositories/data');
var Datum = require('../models/datum');
var controller = new Foxx.Controller(applicationContext);

var datumIdSchema = joi.string().required()
   
.description('The id of the datum')
   
.meta({allowMultiple: false});

var data = new Data(
    applicationContext
.collection('data'),
   
{model: Datum}

controller
.get('/neighbors', function(req, res) {
   
var fullId = req.parameters.fullId;
   
var id = req.parameters.id;
   
var originVertexName = req.parameters.originVertexName;
   
var neighborVertexRestriction = JSON.parse(req.parameters.neighborVertexRestriction); //this is an array of collection names
   
var maxDepth = Number(req.parameters.maxDepth) + 1;
   
var limitOffset = Number(req.parameters.limitOffset) || 0;
   
var limit = Number(req.parameters.limit) || 30;
   
var graphName = req.parameters.graphName;
   
var progress = {}

   
var traversal = require("org/arangodb/graph/traversal");
   
var result = {
        edges
: [],
        vertices
: {}
   
};
   
var myVisitor = function (config, result, vertex, path, connected) {
       
var currentDepth = path.edges.length;
       
if(currentDepth === 0) { //if we are on the origin vertex
            if (!result.vertices.hasOwnProperty(vertex._id)) {
               
// If we visit a vertex, we store its data and prepare out/in
                result
.vertices[vertex._id] = {
                    vertexData
: vertex,
                    tempVertexData
: {
                        evidence
: [],
                        argument
: {supporting: [], supported: [], contradicting: [], contradicted: []},
                        tags
: [],
                        distFromQueried
: currentDepth
                   
},
                    outEdges
: [],
                    inEdges
: []
               
};
           
}
       
} else { //for all vertices in addition to the origin vertex
           
var e = path.edges[currentDepth-1]; //get the edge data for the edge immediately leading to this node en route from the original node
            e
._relativeDepth = currentDepth;
           
if(currentDepth < maxDepth) {
                result
.edges.push(e);
               
if (!result.vertices.hasOwnProperty(vertex._id) && _.contains(neighborVertexRestriction, vertex._id.split('/')[0])) {
                   
// If we visit a vertex, we store its data and prepare out/in
                    result
.vertices[vertex._id] = {
                        vertexData
: vertex,
                        tempVertexData
: {
                            evidence
: [],
                            argument
: {supporting: [], supported: [], contradicting: [], contradicted: []},
                            tags
: [],
                            distFromQueried
: currentDepth
                       
},
                        outEdges
: [],
                        inEdges
: []
                   
};
               
}
           
}

           
// add the edge data to the vertices' tempVertexData property
           
if (e._id.split('/')[0] === "edgeType1" && vertex._id.split('/')[0] === 'DocumentA') {
                result
.vertices[e._to].tempVertexData.evidence.push({name: vertex.name, _id: vertex._id})
           
}
           
if (e._id.split('/')[0] === "edgeType1" && vertex._id.split('/')[0] === 'DocumentB' && e.type === "supports") {
                result
.vertices[e._to].tempVertexData.argument.supporting.push({name: vertex.name, _id: vertex._id});
                result
.vertices[e._from].tempVertexData.argument.supported.push({name: vertex.name, _id: vertex._id})
           
}
           
if (e._id.split('/')[0] === "edgeType1" && vertex._id.split('/')[0] === 'DocumentB' && e.type === "contradicts") {
                result
.vertices[e._to].tempVertexData.argument.contradicting.push({name: vertex.name, _id: vertex._id})
                result
.vertices[e._from].tempVertexData.argument.contradicted.push({
                    name
: vertex.name,
                    _id
: vertex._id
               
})
           
}
           
if (e._id.split('/')[0] === "edgeType2" && vertex._id.split('/')[0] === 'DocumentA') {
                result
.vertices[e._to].tempVertexData.tags.push({name: vertex.name, _id: vertex._id})
           
}

           
// push in and out edges to lists on each vertex
           
if (result.vertices.hasOwnProperty(e._from)) {
                result
.vertices[e._from].outEdges.push(e._to);
           
}
           
if (result.vertices.hasOwnProperty(e._to)) {
                result
.vertices[e._to].inEdges.push(e._from);
           
}
       
}
   
};
   
var config = {
        datasource
: traversal.generalGraphDatasourceFactory('combinedGraph'),
        strategy
: "breadthfirst",
        order
: "preorder",
        visitor
: myVisitor,
        expander
: traversal.anyExpander,
        minDepth
: 0,
        maxDepth
: maxDepth
   
};
   
var traverser = new traversal.Traverser(config);
   
var slicedVertices = [],
        extraVertices1
= [],
        extraVertices2
= [],
        filteredEdges
= [],
        filterList
= [];
    traverser
.traverse(result, {_id: fullId});
    progress
.totalVertices = Object.keys(result.vertices).length;
    result
.vertices[fullId].vertexData = db._document(fullId);

    //sort by inEdges length and then distance from the queried node, so that the queried node is always first
   
var sortedVertices = _.chain(result.vertices).sortBy(function(v) {return v.inEdges.length}).reverse().sortBy(function(v) {return v.tempVertexData.distFromQueried}).value();
   
if(limitOffset===0) {
        result
.vertices = sortedVertices.slice(limitOffset, limit + limitOffset);
        filterList
= result.vertices.map(function(v){return v.vertexData._id}); //get list of the vertices to be returned in order to filter edges with
        result
.edges = _.filter(result.edges, function(e){return (_.contains(filterList, e._to) && _.contains(filterList, e._from))})
   
} else {
        slicedVertices
= sortedVertices.slice(limitOffset, limit + limitOffset);
        filterList
= slicedVertices.map(function(v){return v.vertexData._id}); //get list of new vertices to be returned
        filteredEdges
= _.filter(result.edges, function(e){return (_.contains(filterList, e._to) || _.contains(filterList, e._from)) && e._relativeDepth < maxDepth}) //get edges that border the new vertices
        extraVertices1
= filteredEdges.map(function(e){return result.vertices[e._from]});
        extraVertices2
= filteredEdges.map(function(e){return result.vertices[e._to]});
        slicedVertices
= _.uniq(slicedVertices.concat(extraVertices1, extraVertices2));
        filterList
= slicedVertices.map(function(v){return v.vertexData._id}); //get list of the vertices to be returned in order to filter edges
        filteredEdges
= _.filter(result.edges, function(e){return (_.contains(filterList, e._to) && _.contains(filterList, e._from))});
        result
.vertices = slicedVertices;
        result
.edges = filteredEdges;
   
}

    progress
.currentVertex = limitOffset + limit;
    progress
.limit = limit;
    progress
.limitOffset = limitOffset;
    progress
.currentQuantityReturned = Math.min(limit, progress.totalVertices);
    result
.progress = progress;
    res
.json(result);
}).errorResponse(ArangoError, 404, 'The datum could not be found');

Let me know if I can clarify anything.

-Ben

Wilfried Gösgens

unread,
May 17, 2016, 9:54:53 AM5/17/16
to ArangoDB
Hi,

Most probably the best way is to use a traversal of depth 2, and figuring out the paths in .js afterwards.
ArangoDB 3 will bring some major overhauls to the graphing features; they will most probably solve some of your problems.
Michael will have a deeper look at your post once the 3.0 Release is available.

Regarding your permission troubles, whats your host OS?

Regarding console.log - if you don't run arangod as daemon, but on the console, you will see your output there; else you will find it in the ArangoDB logfiles.


Cheers,
Willi

ropel...@gmail.com

unread,
May 18, 2016, 8:20:25 AM5/18/16
to ArangoDB
Thanks, Willi. Definitely looking forward to 3.0. I'm on linux (mint) but not much of a power user, I just switched over from Windows a few months ago. Regarding permission, I'm guessing I should either run Webstorm as an admin or change the permissions on the arango controllers directory?

Got the console working after a bit of work. Thanks.

~~~

A note about traversals as well for anyone who might be reading: I tried just doing a basic depth + 1 traversal in order to get all the extra edges and paths, and it ends up being significantly slower with my test node slower than the other queries using subqueries. Might be related to my graph structure.

Here are some of the different ways I've tried to do the query:

1. GRAPH_NEIGHBORS depth = 1 along a specific graph with subqueries* for each node; GRAPH_EDGES for edges; then sort and limit by (temporary, constructed from graph) node properties (as suggested by mhacki on stackoverflow, with added subqueries, sorting and limiting))

2. FOR ... GRAPH depth = 1 traversal along specific graph with subqueries for each node, storing nodes and edges; then sort and limit by (temporary, constructed from graph) node properties

3. FOR ... GRAPH depth = 2 traversal along combined graph, storing nodes and edges and paths; to be sorted/limited/rearranged later via javascript (the query itself, without javascript, is slower here than the others)

4. GRAPH NEIGHBORS depth = 1 with a subquery only for the field the nodes will be sorted (and limited) by; sort and limit the nodes; then run subqueries on the limited subset of nodes

5. Foxx traversal, depth = 2 along combined graph, pushing nodes and edges as necessary (as suggested by mhacki on stackoverflow, with additions)

* I've tried both "FOR ... GRAPH" traversals and "GRAPH_NEIGHBORS" for the subqueries; graph neighbors seems to work a bit faster

-Ben

Wilfried Gösgens

unread,
May 18, 2016, 8:27:51 AM5/18/16
to ArangoDB
Hi,
in linux you can create a common group in /etc/group:

foxdevelopers:x:nnnn:arangodb,ropeladder

(if ropeladder is your login, nnnn the GID number for the new group)

You may need to log off & on in order to make that available to your session; you also may need to restart arangod
then run:

chgrp -R foxdevelopers /var/lib/arangodb-apps

chmod g+rws /var/lib/arangodb-apps

and then restart Arangod.

Now files you create should be owned by the group 'foxxdevelopers', that contains your user and ArangoDB.
So both should be able to create, add and modify files.

The sticky bit in the chmod will ensure all files created under that directory derive the group rights.

Cheers,
Willi

ropel...@gmail.com

unread,
May 18, 2016, 6:20:18 PM5/18/16
to ArangoDB
Thanks for the help. I'm running into another permissions-related error now, unfortunately.

Since I tried running arangod [/usr/bin/arangod.sh] manually in a terminal (in order to view console.log statements from Foxx) I've been unable to run arangoDB as a service. It's giving me the following error:

2016-05-18T22:15:21Z [3978] ERROR cannot open file '/myDataDir/ArangoDB/SHUTDOWN': 'Permission denied'
2016-05-18T22:15:21Z [3978] ERROR could not open shutdown file '/myDataDir/ArangoDB/SHUTDOWN': internal error
2016-05-18T22:15:21Z [3978] FATAL unable to start WAL logfile manager

I tried renaming the SHUTDOWN file just for fun, and when I do that it just gives me a permission error on some of the journal files within the /ArangoDB/journals/ directory

I had to run arangod.sh as a super user, and I'm guessing that somehow screwed up permissions for when I want to run arangodb as a service? But I'm running the service as a super user also...

(restarting/logging in and out doesn't help, btw)

Wilfried Gösgens

unread,
May 19, 2016, 4:28:13 AM5/19/16
to ArangoDB
Hi,
 doing:

root@bruecklinux:~# ls -l /var/lib/arangodb
total 20
-rw-rw---- 1 arangodb arangodb   66 Jan 26 17:33 SERVER
-rw-rw---- 1 arangodb arangodb  118 Apr 13 16:36 SHUTDOWN
drwx------ 4 arangodb arangodb 4096 Apr  6 13:35 cluster
drwxrwxrwx 3 arangodb arangodb 4096 Jan 26 17:33 databases
drwxr-xr-x 2 arangodb arangodb 4096 Apr 13 16:36 journals

if its not like that, and you try to launch arangod as user arangodb you can adjust it like that:

chown -R arangodb:arangodb /var/lib/arangodb

Please note the difference between the database directory, and the arangodb-apps directory, which is where the foxx services live.

Cheers,
Willi
Reply all
Reply to author
Forward
0 new messages