Hi, currently I'm using MonetDB and DataBrewery Cubes (Light-weight Python OLAP server).
But I found that ArangoDB is very interesting and want to replace those with ArangoDB.
However, I am a newcomer for graph database and ArangoDB,
and would like to hear any advice on designing the graph database.
I have found many challenges when I have tried to use ArangoDB:
1. How to design database that
1.1 Has many views (hierarchical) and raw data (flat)
1.2 The hierarchies are changable while data is not changed.
For example, after re-organizing product category, the relationship between products may be changed while product info is not changed.
2. I don’t know how to
2.1 Query raw data by views (hierarchical).
Especially a full AQL query to show up my team that ArangoDB is suitable for our problems.
2.2 The returned result must have all nodes from the graph even if data is missing.
For example, some year may not sell in Africa but the result must have the sale amount as zero for Africa because Africa is defined on the graph.
I have the hierarchical dimensions as the following:
Geography: All -> Continents -> Countries -> Cities
Regional: All -> Region -> Sub-Region
Products: Type -> Sub-Type -> Sub-SubType
Calendar Year: Year -> Quarters -> Months
Fiscal Year: Year -> Quarters -> Months (July, August, September, ..., June)
Gender: Male or Female
Membership: True or False
3. Is this design is OK?
Documents:
months.json: {"_key": "1", "name": "January"}
productInfo.json: {"name": ""}
categoryInfo.json: {"name": ""}
continent.json: {"name": "Europe"}
country.json: {"name": "country", "island": true / false}
city.json: {"_key": "1", "name": "Belgium"}
person.json: {"_key": "1", "gender": "male/female", "age": 30, "isMember": true/false}
sales.json: {"person": "PersonId", "product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2017, "amount": 999, "units": 99}
sales_estimation.json: {"product": "ProductId", "city": "CityId", "region": "RegionName" , "day": 1, "month": 2, "year": 2018, "amount": 1200, "units": 110}
Edges:
year.json: {"_from": "months/1", "_to": "months/2", "type": "calendar/fiscal"}
categories.json: {"_from": "category/1", "_to": "products/2"}
World.json: {"_from": "continent/1", "_to": ""} <-- List of continents in the World
Europe.json: {"_from": "continent/1", "_to": "country/1"} <-- List of countries in Europe
Asia.json: {"_from": "continent/3", "_to": "country/5"} <-- List of countries in Asia
Belgium.json: {"_from": "country/1", "_to": "city/1"} <-- List of cities in Belgium
view_by_some_relationship.json: {"_from": "", "_to": ""} <-- Other views
4. How to query products sales by geography and calendar year or fiscal year?
- The returned result must have all nodes from the graph even if data is missing
- The returned result should be ordered by hierarchy
+------------------+-------------------------------------+-------+
| | <---- Fiscal Year 2016 ----> | |
+ +-------------------------------------+ +
| Geography | 2015 | 2015 | | 2016 | 2016 | Total |
+ +------+--------+ +------+------+ +
| | July | August | ..... | May | June | |
+------------------+------+--------+-------+------+------+-------+
| Europe | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Belgium | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Finland | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Germany | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| North America | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| United States | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Asia | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| China | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Africa | | | | | | |
+------------------+------+--------+-------+------+------+-------+
| Grand Total | | | | | | |
+------------------+------+--------+-------+------+------+-------+
I think AQL may look like
LET cal = (FOR c IN year FILTER c.type == "fiscal" RETURN {"name": DOCUMENT(c._id).name, "month": TO_NUMBER(c._key), "year": (c.month >= 6 OR c.month <= 12) ? year -1 : year})
FOR ctin, ctin_e, ctin_p IN continent
FOR cntry IN OUTBOUND ctin._id `ctin.name`
FOR city IN OUTBOUND cntry._id `cntry.name`
FOR mon IN 0..11
FOR data IN sales
FILTER data.city == city.name && data.year == cal[mon].year && data.month == cal[mon].month
LET name = <-- How to get name by continent, country or city
RETURN {"name": name, cal[mon].name: SUM(data.amount)}
Result:
[
{"name": "Europe", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Belgium", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Finland", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Africa", "July": 0, "August": 0, "September": 0, "total": 0 },
...
{"name": "Grand Total", "July": 2997, "August": 2997, "September": 2997, "total": 29997 } <-- Grand Total Row
]
5. How to count products sales unit by gender and year with sub total of Product and Year and order by product_types?
+-------------------+-----------------------+-----------------------+
| | 2015 | 2016 |
+ Product +-----------------------+-----------------------+
| | Male | Female | Total | Male | Female | Total |
+-------------------+------+--------+-------+------+--------+-------+
| Computer | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| CPU | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| RAM | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| HDD | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| Home appliance | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| Microwave oven | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| Dishwasher | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
| Grand Total | | | | | | |
+-------------------+------+--------+-------+------+--------+-------+
Result:
[
{"name": "Computer", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
{"name": "CPU", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
{"name": "RAM", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
...
{"name": "Grand Total", "male_2015": 111, "female_2015": 222, "total_2015": 333, "male_2016": 111, "female_2016": 222, "total_2016": 333},
]
6. How to calculate cumulative sum?
+---------------+-----------------------+----------------------------------+
| Geography | March (Current Month) | Year to Date (Fiscal / Calendar) |
+---------------+-----------------------+----------------------------------+
| Europe | | |
+---------------+-----------------------+----------------------------------+
| Belgium | | |
+---------------+-----------------------+----------------------------------+
| Finland | | |
+---------------+-----------------------+----------------------------------+
| Germany | | |
+---------------+-----------------------+----------------------------------+
| North America | | |
+---------------+-----------------------+----------------------------------+
| Asia | | |
+---------------+-----------------------+----------------------------------+
| Africa | | |
+---------------+-----------------------+----------------------------------+
| Grand Total | | |
+---------------+-----------------------+----------------------------------+
Result:
[
{"name": "Europe", "march": 111, "ytd": 222},
{"name": "Belgium", "march": 111, "ytd": 222},
...
{"name": "Grand Total", "march": 111, "ytd": 222}
]
7. How to calculate difference percent from sales and sales_estimation by geography?
+---------------+------------+-------------------------+----------------+
| Geography | March 2017 | March 2018 (Estimation) | Difference (%) |
+---------------+------------+-------------------------+----------------+
| Europe | | | |
+---------------+------------+-------------------------+----------------+
| Belgium | | | |
+---------------+------------+-------------------------+----------------+
| Finland | | | |
+---------------+------------+-------------------------+----------------+
| Germany | | | |
+---------------+------------+-------------------------+----------------+
| North America | | | |
+---------------+------------+-------------------------+----------------+
| Asia | | | |
+---------------+------------+-------------------------+----------------+
| Africa | | | |
+---------------+------------+-------------------------+----------------+
| Grand Total | | | |
+---------------+------------+-------------------------+----------------+
Result:
[
{"name": "Europe", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
{"name": "Belgium", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
...
{"name": "Grand Total", "march_2017": 111, "march_2018": 122, "diff_percent": -9.016393443},
]
8. Max product sale for each country for current calendar year?
+---------------+------------+--------+------+
| Country | Product | Amount | Unit |
+---------------+------------+--------+------+
| Belgium | HDD | 1000 | 100 |
+---------------+------------+--------+------+
| Finland | Dishwasher | 999 | 99 |
+---------------+------------+--------+------+
| Germany | | | |
+---------------+------------+--------+------+
| United States | | | |
+---------------+------------+--------+------+
Result:
[
{“name”: “Belgium”, “product”: “HDD”, “amount”: 1000, “unit”: 100},
{“name”: “Finland”, “product”: “Dishwasher”, “amount”: 999, “unit”: 99}
...
]
9. I have special condition, if the country is an island, then suppress all data to Island else as the country name.
- The returned result also order by Continents -> Countries
+---------------+-------------------------------------+-------+
| Geography | <---- Fiscal Year 2016 ----> | Total |
+ +-------------------------------------+ +
| | July | August | ...... | May | June | |
+---------------+------+--------+--------+-----+------+-------+
| Belgium | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| Finland | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| Germany | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| United States | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| China | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| Island | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
| Grand Total | | | | | | |
+---------------+------+--------+--------+-----+------+-------+
Result:
[
{"name": "Belgium", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Finland", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Germany", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "United States", "July": 999, "August": 999, "September": 999, "total": 9999 },
{"name": "Island", "July": 999, "August": 999, "September": 999, "total": 999 },
{"name": "Grand Total", "July": 2997, "August": 2997, "September": 2997, "total": 29997 } <-- Grand Total Row
]
10. How to numbering the result 1 -> 1.1 -> 1.1.1 ?
+----------------------+-------------------------------------+-------+
| Geography | <---- Fiscal Year 2016 ----> | Total |
+ +-------------------------------------+ +
| | July | August | ...... | May | June | |
+----------------------+------+--------+--------+-----+------+-------+
| 1. Europe | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.1 Belgium | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.1.1 Brussels | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.2 Finland | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.2.1 Helsinki | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.2.2 Kuusamo | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.2.3 Lieksa | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 1.3 Germany | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 2. North America | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| 2.1 United States | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
| ...... |
+--------------------------------------------------------------------+
| Grand Total | | | | | | |
+----------------------+------+--------+--------+-----+------+-------+
Result:
[
{"numbering": "1.", "name": "Europe", "July": 999, "total": 9999},
{"numbering": "1.1", "name": "Belgium", "July": 999, "total": 9999},
{"numbering": "1.1.1", "name": "Brussels", "July": 999, "total": 9999},
{"numbering": "1.2", "name": "Finland", "July": 999, "total": 9999},
{"numbering": "1.2.1", "name": "Helsinki", "July": 999, "total": 9999},
...
{"numbering": "5.", "name": "Grand Total", "July": 999, "total": 9999},
]
Thank you in advance and I really appreciate your help.