Hello gentlemen,
I try to load a bunch of data (millions of vertices and some edges between them) from mysql into titan with PHP script.
To get an idea, my use case is storing customer's orders. ID of the order is one vertex [ORDER] and then I create [PRODUCT] vertices which customer bought in this order and connect them with the [ORDER] vertex. And I have approx ~4 million of orders, so the graph will result in millions of vertices.
Since I have to use PHP, which is not a java-enabled language and there is no PHP library for comunicating with Titan as far as I know, I cannot use BatchGraph.
What are the possibilities of loading this amount of data from mysql into titan, but in chunks?
I tried the following:1] I wrote a PHP class for communicating with Rexster REST API with help of these methods
https://github.com/tinkerpop/rexster/wiki/Basic-REST-API, but there is no way of querying or creating multiple vertices/edges at once, the API allows only to request/ create/ update only one edge/vertex at once. So with helps of these API methods I came with this program (pseudocode):
check if vertex with this orderID already exist (to avoid duplicates) -> if no, create it and get ID of this newly created order vertex
for each product associated with the order {
check if the product vertex already exist -> if yes , get its ID, if no create it and get its ID
connect order vertex and product vertex with edge
}
for example for order with 5 products transalates to following REST calls:
[ORDER VERTEX]
check if exist - GET API_URL/graphs/xsgraph/vertices?key=orderId&value=4598995
create - POST API_URL/graphs/xsgraph/vertices/
set properties - POST API_URL/graphs/xsgraph/vertices?key=orderId&value=4598995&key=time&value=123456789
[PRODUCT VERTICES] - this is repeated 5 times
check if exist - GET API_URL/graphs/xsgraph/vertices?key=productId&value=9999
create - POST API_URL/graphs/xsgraph/vertices/
set properties - POST API_URL/graphs/xsgraph/vertices?key=productId&value=4598995
it is 18 REST calls in total for one order in worst case which is incredibly slow
2] i modified my code to use Rexster gremlin extension for creating and assingning the properties to newly created vertex with this query
g.addVertex(null,["orderId":123456,"time":987987]);however, since this change i got
OutOfMemory Exception so I tried to change in the java heap size (-Xmx switch) from 512M to 8000M but it still gives me the same exception.
a] Is there some sort of memory leak or something? or the gremlin extension cannot handle this amount of requests?
b] Is there some way to add multiple vertices with properties at once so I would not make this amount of REST requests at once? and the samo for edges?
here is one of the stack traces:
Caused by: java.lang.OutOfMemoryError: PermGen space
142125977 [Grizzly(1)] ERROR com.tinkerpop.rexster.GraphResource - It would be smart to trap this this exception within the extension and supply a good response to the user:PermGen space
java.lang.OutOfMemoryError: PermGen space
142133826 [Grizzly(2)] ERROR com.tinkerpop.rexster.GraphResource - Dynamic invocation of the [tp:gremlin+*] extension failed.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.tinkerpop.rexster.AbstractSubResource.invokeExtension(AbstractSubResource.java:322)
at com.tinkerpop.rexster.AbstractSubResource.invokeExtension(AbstractSubResource.java:229)
at com.tinkerpop.rexster.GraphResource.executeGraphExtension(GraphResource.java:281)
at com.tinkerpop.rexster.GraphResource.getGraphExtension(GraphResource.java:224)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchProvider$TimedRequestDispatcher.dispatch(InstrumentedResourceMethodDispatchProvider.java:30)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.glassfish.grizzly.servlet.FilterChainImpl.doFilter(FilterChainImpl.java:147)
at org.glassfish.grizzly.servlet.FilterChainImpl.invokeFilterChain(FilterChainImpl.java:106)
at org.glassfish.grizzly.servlet.ServletHandler.doServletService(ServletHandler.java:252)
at org.glassfish.grizzly.servlet.ServletHandler.service(ServletHandler.java:188)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:164)
at org.glassfish.grizzly.http.server.HttpHandlerChain.service(HttpHandlerChain.java:196)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:164)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:175)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:815)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.LeaderFollowerNIOStrategy.executeIoEvent(LeaderFollowerNIOStrategy.java:102)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:88)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:398)
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:368)
at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:334)
at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:264)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:567)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:547)
at java.lang.Thread.run(Thread.java:744)
thanks for your time!