Just a quick question to be fully sure that we're not doing dangerous things :)
Let say we have a template Neo4j datastore somewhere on the file system, containing some nodes with properties, relationships, and indexes. The folder containing this template DB is duplicated multiple times in other folders in the FS. Then, using the java API, we open some copied databases and do things inside (different things, in fact there is one datastore per customer) Is there any risks of conflict, for exemple based on an internal identification number (neostore UUID ? index UUID ?, or the different GraphDatabaseService instances will never conflict at all ? I was wondering that because of a "neostore.id" file in the datastore folder
In general this will work, depends on what you want to do with the databases later, e.g. if they should join a HA cluster, then there might be cause for conflict, otherwise they are independend
As long as you don't have concurrent access to the datastores. Also where you should watch out for is running multiple neo4j databases on a single machine concurrently as long-term stores,
those should be configured to use the appropriate amount of memory for mmio and java-heap.
the store-id is mostly there for identification e.g. in logfiles etc.
> Just a quick question to be fully sure that we're not doing dangerous things :)
> Let say we have a template Neo4j datastore somewhere on the file system, containing some nodes with properties, relationships, and indexes.
> The folder containing this template DB is duplicated multiple times in other folders in the FS.
> Then, using the java API, we open some copied databases and do things inside (different things, in fact there is one datastore per customer)
> Is there any risks of conflict, for exemple based on an internal identification number (neostore UUID ? index UUID ?, or the different GraphDatabaseService instances will never conflict at all ?
> I was wondering that because of a "neostore.id" file in the datastore folder
We are providing a SaaS solution for scientific modeling and computation about environmental impacts. We decided to have one database per customer for many reasons (security, scalabilty, easier custom-development and customer-specific version)
We already have a multi-datastore using Neo4j in pre-production, so care have been taken about memory and it should not be an issue for now :)
For now we are creating new databases and populating them using JAXB and XML files. Here comes the specific usecase related to the question: for legacy reasons, we will have to migrate customers from an plain SQL version to this new one (SQL/Neo4j hybrid). The migration code needs a new Neo4J database populated with initial data.
We don't want to maintain the migration code, but the initial populating part will certainly evolve (the science part is pretty new and we're changing data models more often than in usual softwares). As we already have an internal java mechanism for database schema upgrades, we would like to freeze an initially populated database, use it as a start for the migration, then apply upgraders starting from this point on it. It's much easier than maintaining an old java init-populating mechanism.
And to duplicate a whole Neo4j database, "cp -r" sounds pretty easy :)
Thanks for your answer ! What did you mean by "concurrent access to the datastores" ?
Christophe
Le lundi 8 octobre 2012 18:49:45 UTC+2, Michael Hunger a écrit :
> In general this will work, depends on what you want to do with the > databases later, e.g. if they should join a HA cluster, then there might be > cause for conflict, otherwise they are independend > As long as you don't have concurrent access to the datastores. Also where > you should watch out for is running multiple neo4j databases on a single > machine concurrently as long-term stores, > those should be configured to use the appropriate amount of memory for > mmio and java-heap.
> the store-id is mostly there for identification e.g. in logfiles etc.
> May I ask what the use-case for this approach is?
> HTH
> Michael
> Am 08.10.2012 um 17:53 schrieb Christophe Porté:
> > Hi,
> > Just a quick question to be fully sure that we're not doing dangerous > things :)
> > Let say we have a template Neo4j datastore somewhere on the file system, > containing some nodes with properties, relationships, and indexes. > > The folder containing this template DB is duplicated multiple times in > other folders in the FS. > > Then, using the java API, we open some copied databases and do things > inside (different things, in fact there is one datastore per customer) > > Is there any risks of conflict, for exemple based on an internal > identification number (neostore UUID ? index UUID ?, or the different > GraphDatabaseService instances will never conflict at all ? > > I was wondering that because of a "neostore.id" file in the datastore > folder
Wow, that sounds great.
Looking forward to some blog posts from your side describing what you do to the public :)
#1 you can also look into cypher for pre-populating your neo4j database (as of 1.8 you can create and update stores) you can also treat cypher scripts much like sql-scripts that you can version-control and apply/migrate
it is as easy as piping a cypher script wrapped by begin/commit to the neo4j-shell.
#2 I meant with concurrent access - multiple JVMs accessing the same neo4j-db.
E.g.
#import.sh
if [ ! -f init.cql ]; then
echo "Creating init.cql"
ruby import.rb > init.cql fi
rm -rf cineasts.db
cat init.cql | neo4j/bin/neo4j-shell -path cineasts.db -config neo4j.properties
> We are providing a SaaS solution for scientific modeling and computation about environmental impacts. We decided to have one database per customer for many reasons (security, scalabilty, easier custom-development and customer-specific version)
> We already have a multi-datastore using Neo4j in pre-production, so care have been taken about memory and it should not be an issue for now :)
> For now we are creating new databases and populating them using JAXB and XML files. Here comes the specific usecase related to the question: for legacy reasons, we will have to migrate customers from an plain SQL version to this new one (SQL/Neo4j hybrid). The migration code needs a new Neo4J database populated with initial data.
> We don't want to maintain the migration code, but the initial populating part will certainly evolve (the science part is pretty new and we're changing data models more often than in usual softwares). As we already have an internal java mechanism for database schema upgrades, we would like to freeze an initially populated database, use it as a start for the migration, then apply upgraders starting from this point on it. It's much easier than maintaining an old java init-populating mechanism.
> And to duplicate a whole Neo4j database, "cp -r" sounds pretty easy :)
> Thanks for your answer ! > What did you mean by "concurrent access to the datastores" ?
> Christophe
> Le lundi 8 octobre 2012 18:49:45 UTC+2, Michael Hunger a écrit :
> In general this will work, depends on what you want to do with the databases later, e.g. if they should join a HA cluster, then there might be cause for conflict, otherwise they are independend > As long as you don't have concurrent access to the datastores. Also where you should watch out for is running multiple neo4j databases on a single machine concurrently as long-term stores, > those should be configured to use the appropriate amount of memory for mmio and java-heap.
> the store-id is mostly there for identification e.g. in logfiles etc.
> May I ask what the use-case for this approach is?
> HTH
> Michael
> Am 08.10.2012 um 17:53 schrieb Christophe Porté:
> > Hi,
> > Just a quick question to be fully sure that we're not doing dangerous things :)
> > Let say we have a template Neo4j datastore somewhere on the file system, containing some nodes with properties, relationships, and indexes. > > The folder containing this template DB is duplicated multiple times in other folders in the FS. > > Then, using the java API, we open some copied databases and do things inside (different things, in fact there is one datastore per customer) > > Is there any risks of conflict, for exemple based on an internal identification number (neostore UUID ? index UUID ?, or the different GraphDatabaseService instances will never conflict at all ? > > I was wondering that because of a "neostore.id" file in the datastore folder
Good news this .cql files :)
I imagine the capability to dump a database in such file is planned ?
Ok for the concurrent access, this should not be the case
Thanks for all your answers !
Christophe
Le lundi 8 octobre 2012 22:55:57 UTC+2, Michael Hunger a écrit :
> Wow, that sounds great.
> Looking forward to some blog posts from your side describing what you do > to the public :)
> #1 you can also look into cypher for pre-populating your neo4j database > (as of 1.8 you can create and update stores) you can also treat cypher > scripts much like sql-scripts that you can version-control and apply/migrate
> it is as easy as piping a cypher script wrapped by begin/commit to the > neo4j-shell.
> #2 I meant with concurrent access - multiple JVMs accessing the same > neo4j-db.
> Am 08.10.2012 um 22:44 schrieb Christophe Porté:
> Sure !
> We are providing a SaaS solution for scientific modeling and computation > about environmental impacts. We decided to have one database per customer > for many reasons (security, scalabilty, easier custom-development and > customer-specific version)
> We already have a multi-datastore using Neo4j in pre-production, so care > have been taken about memory and it should not be an issue for now :)
> For now we are creating new databases and populating them using JAXB and > XML files. Here comes the specific usecase related to the question: for > legacy reasons, we will have to migrate customers from an plain SQL version > to this new one (SQL/Neo4j hybrid). The migration code needs a new Neo4J > database populated with initial data.
> We don't want to maintain the migration code, but the initial populating > part will certainly evolve (the science part is pretty new and we're > changing data models more often than in usual softwares). As we already > have an internal java mechanism for database schema upgrades, we would like > to freeze an initially populated database, use it as a start for the > migration, then apply upgraders starting from this point on it. It's much > easier than maintaining an old java init-populating mechanism.
> And to duplicate a whole Neo4j database, "cp -r" sounds pretty easy :)
> Thanks for your answer ! > What did you mean by "concurrent access to the datastores" ?
> Christophe
> Le lundi 8 octobre 2012 18:49:45 UTC+2, Michael Hunger a écrit :
>> In general this will work, depends on what you want to do with the >> databases later, e.g. if they should join a HA cluster, then there might be >> cause for conflict, otherwise they are independend >> As long as you don't have concurrent access to the datastores. Also where >> you should watch out for is running multiple neo4j databases on a single >> machine concurrently as long-term stores, >> those should be configured to use the appropriate amount of memory for >> mmio and java-heap.
>> the store-id is mostly there for identification e.g. in logfiles etc.
>> May I ask what the use-case for this approach is?
>> HTH
>> Michael
>> Am 08.10.2012 um 17:53 schrieb Christophe Porté:
>> > Hi,
>> > Just a quick question to be fully sure that we're not doing dangerous >> things :)
>> > Let say we have a template Neo4j datastore somewhere on the file >> system, containing some nodes with properties, relationships, and indexes. >> > The folder containing this template DB is duplicated multiple times in >> other folders in the FS. >> > Then, using the java API, we open some copied databases and do things >> inside (different things, in fact there is one datastore per customer) >> > Is there any risks of conflict, for exemple based on an internal >> identification number (neostore UUID ? index UUID ?, or the different >> GraphDatabaseService instances will never conflict at all ? >> > I was wondering that because of a "neostore.id" file in the datastore >> folder
Christophe,
yes that is planned. Already now you can dump the contents of the DB
into CQL files, you can even construct them from the Cypher statement
much like in SQL, like
start n=node(1,2,3) match n-[r?]-other where ID(n)>ID(other) return
"CREATE ("+ id(n) + "({name:'"+n.name?+"'})"
Note however that this is BAD behaviour and very hacky.
The main missing point is that there is no good way to express the
contents of the indexes, since they are totally free to fill manually.
Export-Import is working if you enable the same autoindexes in both
databases.
On Tue, Oct 9, 2012 at 10:04 PM, Christophe Porté <h...@anthologique.net> wrote:
> Good news this .cql files :)
> I imagine the capability to dump a database in such file is planned ?
> Ok for the concurrent access, this should not be the case
> Thanks for all your answers !
> Christophe
> Le lundi 8 octobre 2012 22:55:57 UTC+2, Michael Hunger a écrit :
>> Wow, that sounds great.
>> Looking forward to some blog posts from your side describing what you do
>> to the public :)
>> #1 you can also look into cypher for pre-populating your neo4j database
>> (as of 1.8 you can create and update stores) you can also treat cypher
>> scripts much like sql-scripts that you can version-control and apply/migrate
>> it is as easy as piping a cypher script wrapped by begin/commit to the
>> neo4j-shell.
>> #2 I meant with concurrent access - multiple JVMs accessing the same
>> neo4j-db.
>> Am 08.10.2012 um 22:44 schrieb Christophe Porté:
>> Sure !
>> We are providing a SaaS solution for scientific modeling and computation
>> about environmental impacts. We decided to have one database per customer
>> for many reasons (security, scalabilty, easier custom-development and
>> customer-specific version)
>> We already have a multi-datastore using Neo4j in pre-production, so care
>> have been taken about memory and it should not be an issue for now :)
>> For now we are creating new databases and populating them using JAXB and
>> XML files. Here comes the specific usecase related to the question: for
>> legacy reasons, we will have to migrate customers from an plain SQL version
>> to this new one (SQL/Neo4j hybrid). The migration code needs a new Neo4J
>> database populated with initial data.
>> We don't want to maintain the migration code, but the initial populating
>> part will certainly evolve (the science part is pretty new and we're
>> changing data models more often than in usual softwares). As we already have
>> an internal java mechanism for database schema upgrades, we would like to
>> freeze an initially populated database, use it as a start for the migration,
>> then apply upgraders starting from this point on it. It's much easier than
>> maintaining an old java init-populating mechanism.
>> And to duplicate a whole Neo4j database, "cp -r" sounds pretty easy :)
>> Thanks for your answer !
>> What did you mean by "concurrent access to the datastores" ?
>> Christophe
>> Le lundi 8 octobre 2012 18:49:45 UTC+2, Michael Hunger a écrit :
>>> In general this will work, depends on what you want to do with the
>>> databases later, e.g. if they should join a HA cluster, then there might be
>>> cause for conflict, otherwise they are independend
>>> As long as you don't have concurrent access to the datastores. Also where
>>> you should watch out for is running multiple neo4j databases on a single
>>> machine concurrently as long-term stores,
>>> those should be configured to use the appropriate amount of memory for
>>> mmio and java-heap.
>>> the store-id is mostly there for identification e.g. in logfiles etc.
>>> May I ask what the use-case for this approach is?
>>> HTH
>>> Michael
>>> Am 08.10.2012 um 17:53 schrieb Christophe Porté:
>>> > Hi,
>>> > Just a quick question to be fully sure that we're not doing dangerous
>>> > things :)
>>> > Let say we have a template Neo4j datastore somewhere on the file
>>> > system, containing some nodes with properties, relationships, and indexes.
>>> > The folder containing this template DB is duplicated multiple times in
>>> > other folders in the FS.
>>> > Then, using the java API, we open some copied databases and do things
>>> > inside (different things, in fact there is one datastore per customer)
>>> > Is there any risks of conflict, for exemple based on an internal
>>> > identification number (neostore UUID ? index UUID ?, or the different
>>> > GraphDatabaseService instances will never conflict at all ?
>>> > I was wondering that because of a "neostore.id" file in the datastore
>>> > folder