Creating multiple vertices from same line fields in a CSV for OrienDB ETL

129 views
Skip to first unread message

praveen....@tigeranalytics.com

unread,
Aug 25, 2016, 9:14:18 AM8/25/16
to OrientDB

I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :


"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"

"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"

"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"

"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"


I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.


I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.


{
  "source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
 "extractor": { "csv": {"columnsOnFirstLine": false, "columns":     ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
 "transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
            "joinFieldName": "address",
            "lookup": "PhyLocation.loc",
            "direction": "out",
    "targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
        "edgeFields": { "confidence": "${input.prob}" },
        "unresolvedLinkAction": "CREATE"
        }
    }
 ],
"loader": {
"orientdb": {
   "dbURL": "remote:/localhost/Bulk_Transfer_Test",
   "dbType": "graph",
   "dbUser": "root",
   "dbPassword": "tiger",
   "serverUser": "root",
   "serverPassword": "tiger",
   "classes": [
     {"name": "IpAddress", "extends": "V"},
     {"name": "PhyLocation", "extends": "V"},
 {"name": "Located", "extends": "E"}
   ], "indexes": [
     {"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
 {"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
   ]
}
}
}

user.w...@gmail.com

unread,
Aug 25, 2016, 5:41:26 PM8/25/16
to OrientDB
Hi,

have you tried with two different csv files? Where in one there is only the IpAddress and in the other one the lastest colums.

Hope it helps.

Regards,
Michela

praveen....@tigeranalytics.com

unread,
Aug 28, 2016, 9:41:21 PM8/28/16
to OrientDB
Yes, This worked. Thanks

praveen....@tigeranalytics.com

unread,
Aug 31, 2016, 11:45:36 AM8/31/16
to OrientDB
Hi,

I tried with two different CSV files but still the error persists, all the properties are getting updated in the vertices:

I'm attaching both the Json etl files for your reference :

{
  "source": { "file": { "path": "/home/OrientDB/orientdb-community-2.2.0/bin/only_ip_05.csv" } },

  "extractor": { "csv": {"columnsOnFirstLine": false, "columns":["ip:string"] } },
  "transformers": [
    { "vertex": { "class": "IpAddress" } }
   ],
  "loader": {
    "orientdb": {
       "dbURL": "plocal:/home/OrientDB/orientdb-community-2.2.0/Bulk_Transfer_Test2",
       "dbType": "graph",
       "dbUser": "admin",
       "dbPassword": "admin",
       "serverUser": "admin",
       "wal": false,
       "tx":false,
       "batchCommit":100000,
       "serverPassword":"admin",

       "classes": [
         {"name": "IpAddress", "extends": "V"}
       ],
       "indexes": [
         {"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" }
       ]
    }
  }
}


---------------------------------

{
  "source": { "file": { "path": "/home/labvolume1/orientdb/bin/edge5.csv" } },
  "extractor": { "csv": {"columnsOnFirstLine": false, "columns":["ip:string", "loc:string", "dpbc:string","address:string","prob:string"] } },

  "transformers": [
    { "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
    { "vertex": { "class" : "IpAddress", "skipDuplicates" : true }},
    { "edge": { "class": "Located",
                "joinFieldName": "loc",

                "lookup": "PhyLocation.loc",
                "direction": "out",
            "edgeFields": { "probability": "${input.prob}" },
            "targetVertexFields": { "geo_address": "${input.address}", "dpbc": "${input.dpbc}"},

            "unresolvedLinkAction": "CREATE"
            }
        }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "plocal:/ubuntu/labvolume1/orientdb/databases/Bulk_Transfer_Test7",
       "dbType": "graph",
       "dbUser": "admin",
       "dbPassword": "admin",
       "serverUser": "admin",
       "wal": false,
       "tx":false,
       "batchCommit":10000,
       "serverPassword":"admin",

       "classes": [
         {"name": "IpAddress", "extends": "V"},
         {"name": "PhyLocation", "extends": "V"},
     {"name": "Located", "extends": "E"}
       ],
       "indexes": [
         {"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
     {"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
       ]
    }
  }
}

Reply all
Reply to author
Forward
0 new messages