Mutating Cypher, Streaming results and more!

37 views
Skip to first unread message

Nikhil Lanjewar

unread,
May 14, 2012, 9:32:49 AM5/14/12
to neo4j...@googlegroups.com
Hey Guys,

With Neo4j 1.8M01 out there since over a week, did anyone get a chance to check out mutating Cypher and streaming results over REST API?

--
Nikhil

Michael Hunger

unread,
May 14, 2012, 11:58:45 AM5/14/12
to neo4j...@googlegroups.com
Good point, would be interesting from a performance standpoint.

Do you have a scenario in mind, which you want to create?

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j India" group.
To post to this group, send email to neo4j...@googlegroups.com.
To unsubscribe from this group, send email to neo4j-india...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/neo4j-india?hl=en.

Nikhil Lanjewar

unread,
May 15, 2012, 3:59:48 AM5/15/12
to neo4j...@googlegroups.com
There's one which has been fascinating me for a while.

I would really want to obtain the data source of at least one of the Citizen projects from https://www.zooniverse.org/projects and try to automate the process.

Some context:

Zooniverse had initiated various science projects which they call Citizen projects. Participants are shown some data and are interactively taught to identify data patterns. They believe human eye and perception can help them achieve the best accuracy. These projects involve stellar magnitude data, weather logs, etc collected over time. They have already made some progress in modeling data for an effective web renderer. e.g. http://planethunters.org

Since time-series analysis has been one of the most talked about topics in Neo4j-India's meetups, this perfectly falls in place. We might also solve a real problem in the process! I would love to perform benchmark analysis once I figure out a way to import raw data. I don't even know what the data looks like, might get in touch with Zooniverse guys soon.

If someone knows something of this sort and if the data is freely available, I might take a stab at that.

--
Nikhil Lanjewar
Engineering Lead at YourNextLeap
http://yournextleap.com
http://twitter.com/rhetonik

Michael Hunger

unread,
May 15, 2012, 4:47:18 AM5/15/12
to neo4j...@googlegroups.com, community-team
Sounds great, please keep us in the loop how it goes and how you want to approach it.

also something that you should pitch to our upcoming graph-grant project.

Cheers

Michael

Mahesh Lal

unread,
May 16, 2012, 2:47:50 AM5/16/12
to neo4j...@googlegroups.com, community-team
I was thinking of something on the lines of db:migrate that is there in Rails. 

We could build a data migration framework for existing graphs with the objective of renaming fields, deleting unused old fields, remapping/renaming reltaionships, re-creating indexes etc.

We did this for a client using the Java API for Neo4J (1.6).

Was planning to open source it - however there are legal implications around it. So could write from scratch.


On Tuesday, 15 May 2012 14:17:18 UTC+5:30, Michael Hunger wrote:
Sounds great, please keep us in the loop how it goes and how you want to approach it.

also something that you should pitch to our upcoming graph-grant project.

Cheers

Michael
Am 15.05.2012 um 09:59 schrieb Nikhil Lanjewar:

There's one which has been fascinating me for a while.

I would really want to obtain the data source of at least one of the Citizen projects from https://www.zooniverse.org/projects and try to automate the process.

Some context:

Zooniverse had initiated various science projects which they call Citizen projects. Participants are shown some data and are interactively taught to identify data patterns. They believe human eye and perception can help them achieve the best accuracy. These projects involve stellar magnitude data, weather logs, etc collected over time. They have already made some progress in modeling data for an effective web renderer. e.g. http://planethunters.org

Since time-series analysis has been one of the most talked about topics in Neo4j-India's meetups, this perfectly falls in place. We might also solve a real problem in the process! I would love to perform benchmark analysis once I figure out a way to import raw data. I don't even know what the data looks like, might get in touch with Zooniverse guys soon.

If someone knows something of this sort and if the data is freely available, I might take a stab at that.

--
Nikhil Lanjewar
Engineering Lead at YourNextLeap
http://yournextleap.com
http://twitter.com/rhetonik

On Mon, May 14, 2012 at 9:28 PM, Michael Hunger <michael.hunger@neotechnology.com> wrote:
Good point, would be interesting from a performance standpoint.

Do you have a scenario in mind, which you want to create?

Michael
Am 14.05.2012 um 15:32 schrieb Nikhil Lanjewar:

Hey Guys,

With Neo4j 1.8M01 out there since over a week, did anyone get a chance to check out mutating Cypher and streaming results over REST API?

--
Nikhil

--
You received this message because you are subscribed to the Google Groups "Neo4j India" group.
To post to this group, send email to neo4j...@googlegroups.com.
To unsubscribe from this group, send email to neo4j-india+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/neo4j-india?hl=en.

--
You received this message because you are subscribed to the Google Groups "Neo4j India" group.
To post to this group, send email to neo4j...@googlegroups.com.
To unsubscribe from this group, send email to neo4j-india+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/neo4j-india?hl=en.


--
You received this message because you are subscribed to the Google Groups "Neo4j India" group.
To post to this group, send email to neo4j...@googlegroups.com.
To unsubscribe from this group, send email to neo4j-india+unsubscribe@googlegroups.com.

Michael Hunger

unread,
May 16, 2012, 3:03:45 AM5/16/12
to neo4j...@googlegroups.com
I wrote something similar for a client. A tool that quickly copies an existing store while keeping node-id's (using the batch-inserter), so it could also keep indexes and external indexes.
This allows compacting space and also removing properties.

Cheers

Michael

package org.neo4j.tool;

import org.apache.commons.io.FileUtils;
import org.neo4j.graphdb.*;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.kernel.EmbeddedGraphDatabase;
import org.neo4j.kernel.impl.batchinsert.BatchInserter;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;
import org.neo4j.kernel.impl.nioneo.store.InvalidRecordException;

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.*;

import static java.util.Arrays.asList;
import static java.util.Collections.emptySet;

public class StoreCopy {

    private static PrintWriter logs;

    @SuppressWarnings("unchecked")
    public static Map<String,String> config() {
        return (Map)MapUtil.map(
                "neostore.nodestore.db.mapped_memory", "100M",
                "neostore.relationshipstore.db.mapped_memory", "500M",
                "neostore.propertystore.db.mapped_memory", "300M",
                "neostore.propertystore.db.strings.mapped_memory", "1G",
                "neostore.propertystore.db.arrays.mapped_memory", "300M",
                "neostore.propertystore.db.index.keys.mapped_memory", "100M",
                "neostore.propertystore.db.index.mapped_memory", "100M",
                "cache_type", "weak"
        );
    }
    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.err.println("Usage: StoryCopy source target [rel,types,to,ignore] [properties,to,ignore]");
            return;
        }
        String sourceDir=args[0];
        String targetDir=args[1];
        Set<String> ignoreRelTypes= splitOptionIfExists(args, 2);
        Set<String> ignoreProperties= splitOptionIfExists(args,3);
        System.out.printf("Copying from %s to %s ingoring rel-types %s ignoring properties %s %n", sourceDir, targetDir, ignoreRelTypes, ignoreProperties);
        copyStore(sourceDir,targetDir,ignoreRelTypes,ignoreProperties);
    }

    private static Set<String> splitOptionIfExists(String[] args, final int index) {
        if (args.length <= index) return emptySet();
        return new HashSet<String>(asList(args[index].toLowerCase().split(",")));
    }

    private static void copyStore(String sourceDir, String targetDir, Set<String> ignoreRelTypes, Set<String> ignoreProperties) throws Exception {
        final File target = new File(targetDir);
        final File source = new File(sourceDir);
        if (target.exists()) throw new IllegalArgumentException("Target Directory already exists "+target);
        if (!source.exists()) throw new IllegalArgumentException("Source Database does not exist "+source);

        BatchInserter targetDb = new BatchInserterImpl(target.getAbsolutePath(),config());
        GraphDatabaseService sourceDb = new EmbeddedGraphDatabase(sourceDir, config());
        logs=new PrintWriter(new FileWriter(new File(target,"store-copy.log")));

        copyNodes(sourceDb, targetDb, ignoreProperties);
        copyRelationships(sourceDb, targetDb, ignoreRelTypes,ignoreProperties);

        targetDb.shutdown();
        sourceDb.shutdown();
        logs.close();
        copyIndex(source, target);
    }

    private static void copyIndex(File source, File target) throws IOException {
        final File indexFile = new File(source, "index.db");
        if (indexFile.exists()) {
            FileUtils.copyFile(indexFile, new File(target, "index.db"));
        }
        final File indexDir = new File(source, "index");
        if (indexDir.exists()) {
            FileUtils.copyDirectory(indexDir, new File(target, "index"));
        }
    }

    private static void copyRelationships(GraphDatabaseService sourceDb, BatchInserter targetDb, Set<String> ignoreRelTypes, Set<String> ignoreProperties) {
        long time = System.currentTimeMillis();
        int count=0;
        for (Node node : sourceDb.getAllNodes()) {
            for (Relationship rel : getOutgoingRelationships(node)) {
                if (ignoreRelTypes.contains(rel.getType().name().toLowerCase())) continue;
                createRelationship(targetDb, rel, ignoreProperties);
                count ++;
                if (count % 1000 == 0) System.out.print(".");
                if (count % 100000 == 0) System.out.println(" " + count);
            }
        }
        System.out.println("\n copying of " + count+ " relationships took "+(System.currentTimeMillis()-time)+" ms.");
    }

    private static void createRelationship(BatchInserter targetDb, Relationship rel, Set<String> ignoreProperties) {
        long startNodeId=rel.getStartNode().getId();
        long endNodeId=rel.getEndNode().getId();
        final RelationshipType type = rel.getType();
        try {
            targetDb.createRelationship(startNodeId,endNodeId , type, getProperties(rel, ignoreProperties));
        } catch (InvalidRecordException ire) {
            addLog(rel,"create Relationship: "+startNodeId+"-[:"+type+"]"+"->"+endNodeId,ire.getMessage());
        }
    }

    private static Iterable<Relationship> getOutgoingRelationships(Node node) {
        try {
            return node.getRelationships(Direction.OUTGOING);
        } catch(InvalidRecordException ire) {
            addLog(node,"outgoingRelationships",ire.getMessage());
            return Collections.emptyList();
        }
    }

    private static void copyNodes(GraphDatabaseService sourceDb, BatchInserter targetDb, Set<String> ignoreProperties) {
        final Node refNode = sourceDb.getReferenceNode();
        long time = System.currentTimeMillis();
        int count=0;
        for (Node node : sourceDb.getAllNodes()) {
            if (node.equals(refNode)) {
                targetDb.setNodeProperties(targetDb.getReferenceNode(),getProperties(node,ignoreProperties));
            } else {
                targetDb.createNode(node.getId(), getProperties(node, ignoreProperties));
            }
            count++;
            if (count % 1000 == 0) System.out.print(".");
            if (count % 100000 == 0) {
                logs.flush();
                System.out.println(" " + count);
            }
        }
        System.out.println("\n copying of " + count+ " nodes took "+(System.currentTimeMillis()-time)+" ms.");
    }

    private static Map<String, Object> getProperties(PropertyContainer pc, Set<String> ignoreProperties) {
        Map<String,Object> result=new HashMap<String, Object>();
        for (String property : getPropertyKeys(pc)) {
            if (ignoreProperties.contains(property.toLowerCase())) continue;
            try {
                result.put(property,pc.getProperty(property));
            } catch(InvalidRecordException ire) {
                addLog(pc, property, ire.getMessage());
            }
        }
        return result;
    }

    private static Iterable<String> getPropertyKeys(PropertyContainer pc) {
        try {
            return pc.getPropertyKeys();
        } catch(InvalidRecordException ire) {
            addLog(pc,"propertyKeys",ire.getMessage());
            return Collections.emptyList();
        }
    }

    private static void addLog(PropertyContainer pc, String property, String message) {
        logs.append(String.format("%s.%s %s%n",pc,property,message));
    }
}

Mahesh Lal

unread,
May 16, 2012, 3:15:00 AM5/16/12
to neo4j...@googlegroups.com
We relied more on XML - and it was about changing the structure of data that is already present. 

To unsubscribe from this group, send email to neo4j-india...@googlegroups.com.

Nikhil Lanjewar

unread,
May 16, 2012, 3:47:30 AM5/16/12
to neo4j...@googlegroups.com
Hey Michael!

I had been looking around for a backup mechanism. Does your code solve the purpose?
I've tried to re-create an existing graph using GraphML/GraphSON but it didn't respect node/edge IDs.


--
Nikhil Lanjewar
Engineering Lead at YourNextLeap
http://yournextleap.com
http://twitter.com/rhetonik

To unsubscribe from this group, send email to neo4j-india...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages