Re: [TinkerPop] Importing CSV into tinkerpop

2,094 views
Skip to first unread message

Stephen Mallette

unread,
Feb 16, 2013, 11:54:52 AM2/16/13
to gremli...@googlegroups.com
It's very easy to do this from the Gremlin terminal. Assume you have
a simple edge file called edges.txt that has this data in it:

1,2
2,3
3,4
1,4

Start Gremlin and do:

gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> vs=[] as Set;new
File("edges.txt").eachLine{l->p=l.split(",");vs<<p[0];vs<<p[1];}
==>1
==>2
==>3
==>4
gremlin> vs.each{v->g.addVertex(v)}
==>1
==>2
==>3
==>4
gremlin> new File("edges.txt").eachLine{l->p=l.split(",");g.addEdge(g.getVertex(p[0]),g.getVertex(p[1]),'friend')}
gremlin> g.E
==>e[3][1-friend->4]
==>e[2][3-friend->4]
==>e[1][2-friend->3]
==>e[0][1-friend->2]

That's it. You might find this blog post useful if you're pulling
data from different sources into Gremlin in an ad-hoc way:

http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/

Best regards,

Stephen

On Sat, Feb 16, 2013 at 11:43 AM, <dajoh...@gmail.com> wrote:
> After a fair bit of googling, I can't find a straightforward way to take a
> .CSV file and convert it into a Tinkerpop graph. I expect some configuration
> is necessary, but can I do this ingestion without adding a bunch of
> third-party layers to the stack?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Stephen Mallette

unread,
Feb 16, 2013, 8:49:51 PM2/16/13
to gremli...@googlegroups.com
My example was meant to demonstrate a very simple load of a very
simple file to an in-memory TinkerGraph. You likely wouldn't want to
load a 15GB file to a TinkerGraph, though you might load it to Neo4j,
OrientDB or Titan. That said, my example gets more complicated as you
now have to consider transactions and other such things. Look into
using BatchGraph and make sure your underlying graph is properly
configured for batch loading
(https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation) .
As far as groovy file reader, I didn't think that it reads the whole
file to memory...I thought it was smart about such things, but someone
can correct me if I'm wrong.

On Sat, Feb 16, 2013 at 1:48 PM, subhankar biswas <neo20...@gmail.com> wrote:
> thanks a lot...looking for such solution....
> but will it work for big text file like 10-15 Gb...means it'll take whole
> .txt into memory or read only line by line

dajoh...@gmail.com

unread,
Feb 16, 2013, 9:37:34 PM2/16/13
to gremli...@googlegroups.com
Thanks for the reply Stephen, and great discussion.  And yes, your groovy script is safe because it uses eachLine().  Large files might get us into trouble if we used readLines(), which reads the entire file into memory. 

Now, with this nice graph, how can i visualize it in rexster Doghouse? I've tried two routes so far: the gremlin console via the doghouse, and the terminal-based rexster console. I can create new graphs galore, but I can't find here where to get rexster to "see" it.

-Derrick

Stephen Mallette

unread,
Feb 17, 2013, 6:36:28 AM2/17/13
to gremli...@googlegroups.com
Instructions for Dog House are here (there is a section for "visualization").

https://github.com/tinkerpop/rexster/wiki/The-Dog-House

It's a very simple visualization that is centered on a vertex. For
anything more complex, I would recommend dumping to GraphML and
importing to tools like Cytoscape or Gephi.

Stephen

dajoh...@gmail.com

unread,
Feb 17, 2013, 9:15:35 AM2/17/13
to gremli...@googlegroups.com
I've looked at that page several times. It doesn't tell us how to create a new graph, and visualize it in Dog House. 

I executed your sample gremlin script against a text file in rexster/rexster-server/data/edges.txt.  It creates a folder with path .../data/<graph-name>, which is empty. That's it.

Stephen Mallette

unread,
Feb 17, 2013, 9:31:42 AM2/17/13
to gremli...@googlegroups.com
I think you should disconnect your ideas about graph creation,
visualization and rexster.

1. Create a graph in any manner you choose.
2. Configure it for access in rexster.xml
3. Start rexster.
4. Point your browser at Dog House and select the graph you configured
5. Use the "Browse" tab, find a vertex you are interested in and click
the little visualize button which looks like a magnifying glass (has
the tooltip "Visualize").

Stephen

dajoh...@gmail.com

unread,
Feb 17, 2013, 11:45:39 AM2/17/13
to gremli...@googlegroups.com
I've spent several hours trying to fill the gap between step #1 and step #2.

Creating the graph exactly as described earlier doesn't write anything to disk, as far as I can tell.  This is necessary for Rexster.xml configuration.  I've searched through google and this list, and all I can find is the advice to invoke stopTransaction(), or maybe shutdown(), to start the serialization.

I've tried following the pattern used by rexster.xml for the sample data. rexster.xml seems to associate each graph with a directory which (in the case of a TinkerGraph) contains a .dat file. Using either rexster-console.sh or the rexster-gremlin plugin, or the gremlin project's vanilla terminal, I can't find a .dat file anywhere.

-Derrick

Stephen Mallette

unread,
Feb 17, 2013, 12:43:32 PM2/17/13
to gremli...@googlegroups.com
You should definitely call shutdown() on the graph. I noticed that my
example was a bit misleading as I wasn't trying to focus on persisting
the graph, I was focused on your question which was about how to load
the data into a TinkerGraph. Here:

g = new TinkerGraph()

When you do that, TinkerGraph operates in memory only. You have to
tell it where you want to store the .dat as in:

g = new TinkerGraph("/tmp/my-graph")

Does that fix the problem?

dajoh...@gmail.com

unread,
Feb 17, 2013, 2:15:08 PM2/17/13
to gremli...@googlegroups.com
:)
Awesome, now I can see my little graph. The confusion was my fault, I shouldn't have slipped two questions in under the same topic. 

Very exciting stuff, and kudos to you and the team for making a cool product.
Reply all
Reply to author
Forward
0 new messages