Indexing, CSV import and SDN retrieval not working together?

60 views
Skip to first unread message

Liliana Ziolek

unread,
Jul 1, 2014, 3:46:00 AM7/1/14
to ne...@googlegroups.com
Hi,
I'm trying to do - I thought - a rather simple thing - import data through CSV import and then work with it via SDN. It doesn't seem to be working though and not sure if I'm doing something silly or it doesn't work. It seems to me that the data is created fine and SDN can access it via findAll method, but it cannot find it using the indexed field.

My POJO is quite simple, the important bit:

public class City extends GraphNode {
    @Indexed(indexType = IndexType.FULLTEXT, indexName = "locations")
    private String name;
(... other fields)
}

I have an SDN repository with the following methods:
public interface CityRepository extends GraphRepository<City> {
    Page<City> findByNameLike(String name, Pageable page);
    List<City> findByName(String cityName);
...
}
On top of that a super-simple service that pretty much just wraps that into @Transactional.

Everything works fine when I use SDN to put the data in and take it out, this passes fine:
        locationService.addCity("Poznan", "Poland");
        List<City> citiesByNameLike = locationService.getCitiesByNameLike("Pozn*");
        assertThat(citiesByNameLike, hasSize(1));
        assertThat(locationService.getCitiesByName("Poznan"), equalTo(citiesByNameLike));

However, when I import it via CSV import and MERGE, even though I can see that the city is actually there (when I run findAll), it doesn't come back when I try to look it up by name.
CSV import query:
//csv fields: Airport,City,Country,IATAcode,ICAOcode
        String cypherLoadCountries = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
                + "MERGE (country:Country:_Country { name: csvLine.Country } ) "
                + "MERGE (city:City:_City { name: csvLine.City } ) "
                + "MERGE (city) - [:IS_IN] -> (country) "
                + "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
                + "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city)";
        neo4jTemplate.query(cypherLoadCountries, ImmutableMap.of());

Am I doing something silly here? Is there meant to be a call to switch on indexing or perhaps SDN indexes a different field?
Any help appreciated.

Liliana Ziolek

unread,
Jul 1, 2014, 3:46:47 AM7/1/14
to ne...@googlegroups.com
Oh, in case that helps, I'm using Neo4j 2.1.2 and SDN 3.2.0-SNAPSHOT.

Michael Hunger

unread,
Jul 1, 2014, 4:32:07 AM7/1/14
to ne...@googlegroups.com, Mark Needham
Hey,

you are totally right, this sucks. Let me explain why.

right now Cypher can't update fulltext indexes, which SDN uses. These are the legacy Neo4j indexes which require manual addition.
You're right this is really sucky, but except for coding (i.e. manually adding the nodes, properties, values to that fulltext index) or using the neo4j-shell for that index-update, I have no good idea.

It was decided consciously that Cypher will not support writing to legacy indexes. 

The only thing that you can do is to use a legacy auto-index configured as fulltext, but as that index is read-only, SDN can't write to it, so the field would have to be "read-only". Or we would have to add something to SDN that marks a field as using that legacy auto-index and never actually writing to the index itself. But then that also means this has to be taken into account with every other read operation and query generation which makes it a pretty big effort.

So for now I'd rather advise to implement the CSV loading as SDN code using OpenCSV as reader (which is what cypher uses too).

String[] header = reader.nextRow();
for (String[] row : reader.nextRow()) {
   City city = template.save(new City(get(row,header,"City"),template.save(new Country(get(row,header,"Country")));
   Airport ap = template.save(new Airport(get(row,header,"Airport"),get(row,header,"IATAcode"),get(row,header,"ICAOcode"));
   ap.serve(city);
   template.save(ap);
}

Sorry for being not more helpful,

Cheers,

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Liliana Ziolek

unread,
Jul 1, 2014, 5:09:16 AM7/1/14
to ne...@googlegroups.com, Mark Needham

Would it help if the index wasn't a full text index but a normal one - would that make it work?

More importantly, you say that cypher won't support writing to legacy indexes - and as I understand, currently full text index is of legacy type. Is there a plan to introduce a new, cypher-supported full text index in the future neo4j? I'd be happy to go with standard index for now if there was hope that in the future I can just change the index type and go.

Thanks!

Sent from my shiny Nexus 5 phone

You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/w44ApzfsRQI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Jul 1, 2014, 6:04:50 AM7/1/14
to ne...@googlegroups.com
Yep, that would work.

And yes, it is planned to add automatic fulltext, spatial and other indexes to Neo4j in the future.

But let's try to work out together an easy way to get the full text functionality working nonetheless. Would be a fun challenge :)

Liliana Ziolek

unread,
Jul 1, 2014, 2:30:30 PM7/1/14
to ne...@googlegroups.com
Let me check that I understand all this correctly.
There are two options to achieve the fulltext index at the moment:
- (legacy?) manual fulltext index, which is how it is configured for
me right now, where SDN handles the operation of adding things to the
index but that means that when I add the nodes from Cypher, the index
entries do not get created
- legacy automatic fulltext index, where Neo4j would automatically
maintain the index when data is inserted/edited, but SDN wouldn't be
able to touch that index

What I don't get is why making SDN understand the second option is a
big effort. My naive understanding is that what SDN does behind the
scenes in the first case is something like:
- save node, setting the value of the indexed field
- add the index entry - something along the lines of:
graphDb.index().forNodes( "locations" ).add( city, "name",
city.getProperty( "name" ) );

If we let Neo4j handle adding to the the index, isn't it simply a
matter of adding a new index type so that you can specify it in the
annotation and then removing the second step for writes (+ a slight
change on schema creation)? Why do the reads/queries need to change as
well - do you need to provide the details of the index when building
the query?


On Tue, Jul 1, 2014 at 11:04 AM, Michael Hunger
Liliana
"Write your code as if the person maintaining it is a homicidal maniac
who knows where you live."

Michael Hunger

unread,
Jul 1, 2014, 2:48:45 PM7/1/14
to ne...@googlegroups.com
Hi,

you have a pretty good understanding and you're right at that.

Unfortunately there is more to it:

#1 there is only one of these automatic indexes globally called node_auto_index
So any field in any class that uses this fulltext would have to use the same index, and if only two of them share the same property (e.g. "description" there would be clashes, in terms of returning the wrong types of nodes)
#2 the indexes are also used when querying the graph, so that means for the automatic query generation it would have to take into account that certain fields are now using this auto-index and generate different types of queries using this index, but what happens if you have two entities with fulltext entries, should it use that index for both entries, then it would have 1) to disabiguate them and 2) construct a lucene "OR" query instead of the normal index query which is even more ugly in fulltext mode (as you have to quote lookup values and then your entries cannot contain spaces) -> all in all a big mess and effort
#3 as those are legacy indexes all this effort would only have a "short-lived" effect and then removed again altogether.

I think it would more sense to write a 5 line java function that just adds those entries to the index (I just realize I should have pointed that out from the beginning)

try (Transaction tx = db.beginTx()) {
   Index<Node> fts = db.index().forName("locations");
   for (Node city = GlobalGraphOperations.at(db).findNodesWithLabel(DynamicLabel.label("City")) {
       String location = city.getProperty( "name" );
       fts.remove(n); // in case it was already there, alternatively do a check
       fts.add( city, "name", location);
   }
}
  

Liliana Ziolek

unread,
Jul 1, 2014, 3:07:03 PM7/1/14
to ne...@googlegroups.com
Okay, that makes sense, wasn't aware of limitation #1 (and the rest
that is a result), and you're totally right - it's not worth the
effort considering it's not going to be around for long.

I'll go with the manual index update as you suggest, which I'll be
able to remove once proper, non-legacy fulltext indexing comes in
Neo4j.

Thanks a lot!
Reply all
Reply to author
Forward
0 new messages