Il 18/04/2016 10:25, Roberto Franchini ha scritto:
> Now, ETL. If you configure ET to store on a given cluster, all the
> document loaded will be store in that cluster.
> So, you can load different data's partition on different clusters of
> the same class.
> Suppose to have 12 CSVs, one for each month. Each CSV contains
> contains invoices for a single month:
> invoices_01.csv contains invoices for January
> invoices_12.csv contains invoices for December
>
> It could be useful to "partion" Invoice class in 12 clusters, and load
> each csv on its own cluster.
>
> I hope this could clarify what's the purpose of Clusters.
Thank you for the reply, I have read documentation and I think that I
correctly understand the role of clusters in OrientDB.
So, I will explain my issue with your invoices example:
suppose we have two csv files, invoices_01.csv defined as follow:
"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"
and invoices_02.csv defined as follow:
"id","customer","total"
"1","John","1000"
"2","Bob","250"
"3","Jack","630"
"4","Alice","900"
So, I would to create a class named invoices (with default main cluster
named invoices) and two more cluster named respectively invoices_01 (for
the data of the first csv file) and invoices_02 (for the data of the
second one).
I define my first ETL loader as follow:
"loader": {
"orientdb": {
"dbURL": "plocal:../databases/invoices",
"wal": false,
"tx": false,
"batchCommit": 10000,
"dbType": "graph",
"cluster": "invoices_01",
"classes": [
{"name": "invoices", "extends": "V"}
], "indexes": [
{"class":"invoices", "fields":["id:integer"], "type":"UNIQUE" }
]
}
Look at the parameter "cluster" with value "invoice_01" (the second json
ETL loader is similar, it changes only for cluster name and csv file path).
When I launh first ETL module I expect it creates a class with two
clusters named respectively invoices and invoices_01 and I expect that
invoices cluster contains no records and invoices_01 contains all 4
records of csv file.
But my output is different: it creates two clusters respectively with
ids 11 (invoices) and 12 (invoices_01) and it loades data into classes
as follow:
[1:vertex] DEBUG Transformer output: v(invoices)[#11:0]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:0]
[3:vertex] DEBUG Transformer output: v(invoices)[#11:1]
[4:vertex] DEBUG Transformer output: v(invoices)[#12:1]
I think this is not correct, because I think that my loader should be
load data only into cluster with id 12.
However, when I launch the second ETL loader the results is similar: it
creates a new cluster named invoices_02 with id 13 and the log contains:
[1:vertex] DEBUG Transformer output: v(invoices)[#11:2]
[2:vertex] DEBUG Transformer output: v(invoices)[#12:2]
[3:vertex] DEBUG Transformer output: v(invoices)[#13:0]
[4:vertex] DEBUG Transformer output: v(invoices)[#11:3]
I think that the second ETL loader should load data only into cluster
with id 13 (invoices_02). Finally I have three clusters (11, 12, 13)
which contains respectively 4, 3 and 1 records.
I don't know what's my error.