Are there any working examples for using ArangoDB as OLAP cube?

360 views
Skip to first unread message

landon...@gmail.com

unread,
Mar 30, 2017, 1:58:43 AM3/30/17
to ArangoDB
Hi, currently I'm using MonetDB and DataBrewery Cubes (Light-weight Python OLAP server).
But I found that ArangoDB is very interesting and want to replace those with ArangoDB.
However, I am a newcomer for graph database and ArangoDB,
and would like to hear any advice on designing the graph database.

I have found many challenges when I have tried to use ArangoDB: 

1. How to design database that

   1.1 Has many views (hierarchical) and raw data (flat)
   1.2 The hierarchies are changable while data is not changed.
       For example, after re-organizing product category, the relationship between products may be changed while product info is not changed.
   
   
2. I don’t know how to

   2.1 Query raw data by views (hierarchical).
       Especially a full AQL query to show up my team that ArangoDB is suitable for our problems.

   2.2 The returned result must have all nodes from the graph even if data is missing.
       For example, some year may not sell in Africa but the result must have the sale amount as zero for Africa because Africa is defined on the graph.


   I have the hierarchical dimensions as the following:
     
     
Geography: All -> Continents -> Countries -> Cities
     
Regional: All -> Region -> Sub-Region
     
Products: Type -> Sub-Type -> Sub-SubType
     
Calendar Year: Year -> Quarters -> Months
     
Fiscal Year:   Year -> Quarters -> Months (July, August, September, ..., June)
     
Gender:  Male or Female
     
Membership: True or False



3. Is this design is OK?

   Documents:
     
     months
.json: {"_key": "1", "name": "January"}
     productInfo
.json: {"name": ""}
     categoryInfo
.json: {"name": ""}
     continent
.json: {"name": "Europe"}
     country
.json: {"name": "country", "island": true / false}
     city
.json: {"_key": "1", "name": "Belgium"}
     person
.json: {"_key": "1", "gender": "male/female", "age": 30, "isMember": true/false}
     sales
.json: {"person": "PersonId", "product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2017, "amount": 999, "units": 99}
     sales_estimation
.json: {"product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2018, "amount": 1200, "units": 110}



   Edges:
     
     year
.json: {"_from": "months/1", "_to": "months/2", "type": "calendar/fiscal"}
     categories
.json: {"_from": "category/1", "_to": "products/2"}
     
World.json: {"_from": "continent/1", "_to": ""} <-- List of continents in the World
     
Europe.json: {"_from": "continent/1", "_to": "country/1"}  <-- List of countries in Europe
     
Asia.json: {"_from": "continent/3", "_to": "country/5"}  <-- List of countries in Asia
     
Belgium.json: {"_from": "country/1", "_to": "city/1"}  <-- List of cities in Belgium
     view_by_some_relationship
.json: {"_from": "", "_to": ""} <-- Other views




4. How to query products sales by geography and calendar year or fiscal year?

   - The returned result must have all nodes from the graph even if data is missing
   - The returned result should be ordered by hierarchy
   

+------------------+-------------------------------------+-------+
|                  |     <---- Fiscal Year 2016 ---->    |       |
+                  +-------------------------------------+       +
|     Geography    | 2015 |  2015  |       | 2016 | 2016 | Total |
+                  +------+--------+       +------+------+       +
|                  | July | August | ..... |  May | June |       |
+------------------+------+--------+-------+------+------+-------+
| Europe           |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
|    Belgium       |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
|    Finland       |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
|    Germany       |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
| North America    |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
|    United States |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
| Asia             |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
|    China         |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
| Africa           |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+
| Grand Total      |      |        |       |      |      |       |
+------------------+------+--------+-------+------+------+-------+



   I think AQL may look like
   LET cal = (FOR c IN year FILTER c.type == "fiscal" RETURN {"name": DOCUMENT(c._id).name, "month": TO_NUMBER(c._key), "year": (c.month >= 6 OR c.month <= 12) ? year -1 : year})
   FOR ctin
, ctin_e, ctin_p IN continent
     FOR cntry IN OUTBOUND ctin
._id `ctin.name`
       FOR city IN OUTBOUND cntry
._id `cntry.name`
     FOR mon IN
0..11
       FOR data IN sales
          FILTER data
.city == city.name && data.year == cal[mon].year && data.month == cal[mon].month
     LET name
= <-- How to get name by continent, country or city
     RETURN
{"name": name, cal[mon].name: SUM(data.amount)}

   
   Result:
   
   [
     
{"name": "Europe", "July": 999, "August": 999, "September": 999, "total": 9999  },
     
{"name": "Belgium", "July": 999, "August": 999, "September": 999, "total": 9999 },
     
{"name": "Finland", "July": 999, "August": 999, "September": 999, "total": 9999 },
     
{"name": "Africa", "July": 0, "August": 0, "September": 0, "total": 0 },
     
...
     
{"name": "Grand Total", "July": 2997, "August": 2997, "September": 2997, "total": 29997 }  <-- Grand Total Row
   
]


5. How to count products sales unit by gender and year with sub total of Product and Year and order by product_types?


+-------------------+-----------------------+-----------------------+
|                   |          2015         |          2016         |
+      Product      +-----------------------+-----------------------+
|                   | Male | Female | Total | Male | Female | Total |
+-------------------+------+--------+-------+------+--------+-------+
| Computer          |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
|    CPU            |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
|    RAM            |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
|    HDD            |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
| Home appliance    |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
|    Microwave oven |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
|    Dishwasher     |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+
| Grand Total       |      |        |       |      |        |       |
+-------------------+------+--------+-------+------+--------+-------+



   Result:
   [
     {"name": "Computer", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
     {"name": "CPU", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
     {"name": "RAM", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
     ...
     {"name": "Grand Total", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
   ]



6. How to calculate cumulative sum?


+---------------+-----------------------+----------------------------------+
| Geography     | March (Current Month) | Year to Date (Fiscal / Calendar) |
+---------------+-----------------------+----------------------------------+
| Europe        |                       |                                  |
+---------------+-----------------------+----------------------------------+
|    Belgium    |                       |                                  |
+---------------+-----------------------+----------------------------------+
|    Finland    |                       |                                  |
+---------------+-----------------------+----------------------------------+
|    Germany    |                       |                                  |
+---------------+-----------------------+----------------------------------+
| North America |                       |                                  |
+---------------+-----------------------+----------------------------------+
| Asia          |                       |                                  |
+---------------+-----------------------+----------------------------------+
| Africa        |                       |                                  |
+---------------+-----------------------+----------------------------------+
| Grand Total   |                       |                                  |
+---------------+-----------------------+----------------------------------+



   Result:
   
   [
     
{"name": "Europe", "march": 111, "ytd": 222},
     
{"name": "Belgium", "march": 111, "ytd": 222},
     
...
     
{"name": "Grand Total", "march": 111, "ytd": 222}
   
]

   

7. How to calculate difference percent from sales and sales_estimation by geography?


+---------------+------------+-------------------------+----------------+
| Geography     | March 2017 | March 2018 (Estimation) | Difference (%) |
+---------------+------------+-------------------------+----------------+
| Europe        |            |                         |                |
+---------------+------------+-------------------------+----------------+
|    Belgium    |            |                         |                |
+---------------+------------+-------------------------+----------------+
|    Finland    |            |                         |                |
+---------------+------------+-------------------------+----------------+
|    Germany    |            |                         |                |
+---------------+------------+-------------------------+----------------+
| North America |            |                         |                |
+---------------+------------+-------------------------+----------------+
| Asia          |            |                         |                |
+---------------+------------+-------------------------+----------------+
| Africa        |            |                         |                |
+---------------+------------+-------------------------+----------------+
| Grand Total   |            |                         |                |
+---------------+------------+-------------------------+----------------+



   Result:
   
   [
     
{"name": "Europe", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
     
{"name": "Belgium", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
     
...
     
{"name": "Grand Total", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
   
]




8. Max product sale for each country for current calendar year?


+---------------+------------+--------+------+
| Country       | Product    | Amount | Unit |
+---------------+------------+--------+------+
| Belgium       | HDD        | 1000   | 100  |
+---------------+------------+--------+------+
| Finland       | Dishwasher | 999    | 99   |
+---------------+------------+--------+------+
| Germany       |            |        |      |
+---------------+------------+--------+------+
| United States |            |        |      |
+---------------+------------+--------+------+



   Result:
   [
     {“name”: “Belgium”, “product”: “HDD”, “amount”: 1000, “unit”: 100},
     {“name”: “Finland”, “product”: “Dishwasher”, “amount”: 999, “unit”: 99}
     ...
   ]


9. I have special condition, if the country is an island, then suppress all data to Island else as the country name.

   - The returned result also order by Continents -> Countries

+---------------+-------------------------------------+-------+
|   Geography   |     <---- Fiscal Year 2016 ---->    | Total |
+               +-------------------------------------+       +
|               | July | August | ...... | May | June |       |
+---------------+------+--------+--------+-----+------+-------+
| Belgium       |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| Finland       |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| Germany       |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| United States |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| China         |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| Island        |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+
| Grand Total   |      |        |        |     |      |       |
+---------------+------+--------+--------+-----+------+-------+



   Result:
   
  [
     
{"name": "Belgium", "July": 999, "August": 999, "September": 999, "total": 9999  },
     
{"name": "Finland", "July": 999, "August": 999, "September": 999, "total": 9999 },
     
{"name": "Germany", "July": 999, "August": 999, "September": 999, "total": 9999 },
     
{"name": "United States", "July": 999, "August": 999, "September": 999, "total": 9999 },
     
{"name": "Island", "July": 999, "August": 999, "September": 999, "total": 999 },
     
{"name": "Grand Total", "July": 2997, "August": 2997, "September": 2997, "total": 29997 }  <-- Grand Total Row
   
]




10. How to numbering the result 1 -> 1.1 -> 1.1.1 ?


+----------------------+-------------------------------------+-------+
|       Geography      |     <---- Fiscal Year 2016 ---->    | Total |
+                      +-------------------------------------+       +
|                      | July | August | ...... | May | June |       |
+----------------------+------+--------+--------+-----+------+-------+
| 1. Europe            |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|    1.1 Belgium       |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|       1.1.1 Brussels |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|    1.2 Finland       |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|       1.2.1 Helsinki |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|       1.2.2 Kuusamo  |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|       1.2.3 Lieksa   |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|    1.3 Germany       |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
| 2. North America     |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|    2.1 United States |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+
|                               ......                               |
+--------------------------------------------------------------------+
| Grand Total          |      |        |        |     |      |       |
+----------------------+------+--------+--------+-----+------+-------+



   Result:
   
   [
     
{"numbering": "1.",     "name": "Europe", "July": 999, "total": 9999},
     
{"numbering": "1.1",    "name": "Belgium", "July": 999, "total": 9999},
     
{"numbering": "1.1.1",  "name": "Brussels", "July": 999, "total": 9999},
     
{"numbering": "1.2",    "name": "Finland", "July": 999, "total": 9999},
     
{"numbering": "1.2.1",  "name": "Helsinki", "July": 999, "total": 9999},
     
...
     
{"numbering": "5.",     "name": "Grand Total", "July": 999, "total": 9999},
   
]

   


Thank you in advance and I really appreciate your help.

Wilfried Gösgens

unread,
Apr 3, 2017, 10:44:12 AM4/3/17
to ArangoDB


On Thursday, March 30, 2017 at 7:58:43 AM UTC+2, landon...@gmail.com wrote:
Hi, currently I'm using MonetDB and DataBrewery Cubes (Light-weight Python OLAP server).
But I found that ArangoDB is very interesting and want to replace those with ArangoDB.
However, I am a newcomer for graph database and ArangoDB,
and would like to hear any advice on designing the graph database.

I have found many challenges when I have tried to use ArangoDB: 

1. How to design database that

   1.1 Has many views (hierarchical) and raw data (flat)
   1.2 The hierarchies are changable while data is not changed.
       For example, after re-organizing product category, the relationship between products may be changed while product info is not changed.
   
   
The power of document stores is, that it offers the ability to de-normalize at least parts of your data modell, so less joins are needed (but maybe some data is duplicated); see

https://docs.arangodb.com/3.1/Manual/GettingStarted/ComingFromSql.html
for more details.

https://www.slideshare.net/NoSQLmatters/multi-modeldatabases-46361166 explains this in more detail.

 
2. I don’t know how to

   2.1 Query raw data by views (hierarchical).
       Especially a full AQL query to show up my team that ArangoDB is suitable for our problems.

   2.2 The returned result must have all nodes from the graph even if data is missing.
       For example, some year may not sell in Africa but the result must have the sale amount as zero for Africa because Africa is defined on the graph.


   I have the hierarchical dimensions as the following:
     
     
Geography: All -> Continents -> Countries -> Cities
     
Regional: All -> Region -> Sub-Region
     
Products: Type -> Sub-Type -> Sub-SubType
     
Calendar Year: Year -> Quarters -> Months
     
Fiscal Year:   Year -> Quarters -> Months (July, August, September, ..., June)
     
Gender:  Male or Female
     
Membership: True or False




For more complex AQL examples and how to combine the bits, have a look at:  https://docs.arangodb.com/3.1/AQL/Examples/
3. Is this design is OK?

   Documents:
     
     months
.json: {"_key": "1", "name": "January"}
     productInfo
.json: {"name": ""}
     categoryInfo
.json: {"name": ""}
     continent
.json: {"name": "Europe"}
     country
.json: {"name": "country", "island": true / false}
     city
.json: {"_key": "1", "name": "Belgium"}
     person
.json: {"_key": "1", "gender": "male/female", "age": 30, "isMember": true/false}
     sales
.json: {"person": "PersonId", "product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2017, "amount": 999, "units": 99}
     sales_estimation
.json: {"product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2018, "amount": 1200, "units": 110}



   Edges:
     
     year
.json: {"_from": "months/1", "_to": "months/2", "type": "calendar/fiscal"}
     categories
.json: {"_from": "category/1", "_to": "products/2"}
     
World.json: {"_from": "continent/1", "_to": ""} <-- List of continents in the World
     
Europe.json: {"_from": "continent/1", "_to": "country/1"}  <-- List of countries in Europe
     
Asia.json: {"_from": "continent/3", "_to": "country/5"}  <-- List of countries in Asia
     
Belgium.json: {"_from": "country/1", "_to": "city/1"}  <-- List of cities in Belgium
     view_by_some_relationship
.json: {"_from": "", "_to": ""} <-- Other views



you may find https://docs.arangodb.com/3.1/Manual/Graphs/#coming-from-a-relational-background---whats-a-graph interesting, it explains how to map your data model to a graph.
But generally speaking, your sugested solution is a viable one - depending on your query needs.
 
You should avoid to use the document() function, since it can't utilize indices for FILTER conditions and thus may not scale well.
You would usually use the traverser to go through all layers of your graph, and FILTER on the path layers according to your needs.

 

You can use the SUM() function for that:  https://docs.arangodb.com/3.1/AQL/Functions/Numeric.html#sum

You would use COLLECT()  https://docs.arangodb.com/3.1/AQL/Operations/Collect.html to group your data, after that use the generic algoritmic operations to calculate your value.

landon...@gmail.com

unread,
Apr 4, 2017, 12:05:45 AM4/4/17
to ArangoDB
Thank you for your suggestion, but I still have some question
 
You would usually use the traverser to go through all layers of your graph, and FILTER on the path layers according to your needs.

Did you mean I should write foxx to manage this?

Or it's just an AQL, in this case, could you please write a complete AQL to guide me for the first time?


jan.stuecke

unread,
Apr 4, 2017, 4:48:21 AM4/4/17
to ArangoDB
Hi,

this is Jan from ArangoDB Com team.

I took over since my colleague is fully focussed on our 3.2 release.
Understandable that it might be a bit difficult at the beginning to know which option ArangoDB provides best suits a given problem. But we hope that with some initial support and our documentation freshers get a good start though.

Hope you understand that it is just not possible for us to individually support to the extend you requested. It would bind so much resources on our side that we either can't support our customers or can't release new features.

In case the tips and links to our docu my team mate has sent aren't enough, please get in touch with me and and let's find a solution.

You can reach me directly via jan.stuecke (at) arangodb.com

Best,

Jan



...
Reply all
Reply to author
Forward
0 new messages