Clean-up within specs

93 views
Skip to first unread message

dnagir

unread,
Dec 9, 2011, 3:54:35 AM12/9/11
to neo4jrb
Hi,

I wonder how can I clear the database before each spec?

At first I thought that I can wrap each spec within a transaction like
so:
https://github.com/andreasronge/neo4j/blob/master/spec/spec_helper.rb#L64

but roll it back.

But the problem is that the saved models within that transactions are
not traversable.

So that this is false:

User.create
User.count.should > 0

This is pretty unexpected to me.

So my question is how to clean-up the database before each test.

Peter Neubauer

unread,
Dec 9, 2011, 4:00:54 AM12/9/11
to neo...@googlegroups.com
Guys, there is the new ImpermanentGraphDatabase, that is fully
functional but operating on non-persistent RAM-backed FileChannels at
the lowest level. We use it for testing in Java land, I think it would
be very applicable here, too!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

brew install neo4j && neo4j start
heroku addons:add neo4j

> --
> You received this message because you are subscribed to the Google Groups "neo4jrb" group.
> To post to this group, send email to neo...@googlegroups.com.
> To unsubscribe from this group, send email to neo4jrb+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/neo4jrb?hl=en.
>

Andreas Ronge

unread,
Dec 9, 2011, 4:11:33 AM12/9/11
to neo...@googlegroups.com
Does the ImpermanentGraphDatabase support transactions and lucene ?
Check this issue https://github.com/neo4j/community/issues/44
and the following workaround using multitenancy :

c.after(:each) do
finish_tx
Neo4j::Rails::Model.close_lucene_connections
Neo4j::Transaction.run do
Neo4j::Index::IndexerRegistry.delete_all_indexes
end
Neo4j::Transaction.run do
Neo4j.threadlocal_ref_node = Neo4j::Node.new :name =>
"ref_#{$name_counter}"
$name_counter += 1
end
end

from https://github.com/andreasronge/neo4j/blob/master/spec/spec_helper.rb

Vivek Prahlad

unread,
Dec 9, 2011, 4:14:37 AM12/9/11
to neo...@googlegroups.com
Hi,

You have a few choices for doing this. Unfortunately, Neo4j does not yet support a 'clear' for deleting all the nodes in a database.

The options are:
- Manually delete all nodes in your after(:each) block
       Neo4j::Transaction.run do
        Neo4j._all_nodes.each { |n| n.del unless n.neo_id == 0 }
      end

- Exploit the multitenancy feature to create a new reference node before each test, and then delete the entire database after(:all). We use this approach with the neo4j.rb tests, our tests run in approximately a minute on a Macbook pro. Please take a look at https://github.com/andreasronge/neo4j/blob/master/spec/spec_helper.rb in case you'd like more info. Here's a snippet that shows this approach

  RSpec.configure do |c|
    $name_counter = 0
    c.after(:each) do
      Neo4j::Rails::Model.close_lucene_connections
      Neo4j::Transaction do
       Neo4j::Index::IndexRegistry.delete_all_indexes
      end  
      Neo4j::Transaction.run do
        Neo4j.threadlocal_ref_node = Neo4j::Node.new :name => "ref_#{$name_counter}"
        $name_counter += 1
      end
    end

    c.before(:all) do
      rm_db_storage unless Neo4j.running?
    end
    c.after(:all) do
      Neo4j.shutdown
      rm_db_storage
    end

  def rm_db_storage
    FileUtils.rm_rf Neo4j::Config[:storage_path]
    raise "Can't delete db" if File.exist?(Neo4j::Config[:storage_path])
  end

A gotcha with this approach is that multiple lucene files will be opened (the multitenancy feature creates a fresh set of lucene indices per tenant). The delete_all_indexes call has the side-effect of closing all open lucene files. There's a patch that is now part of Neo4j (should hopefully be released as part of 1.6) that takes care of this issue without requiring this hack. Info about this patch here: https://github.com/neo4j/community/pull/51#issuecomment-2442894

Peter, is the ImpermanentGraphDatabase part of Neo4j 1.5? If so, we can try and add support for it so that it can be used for writing tests.

Hope this helps,

Vivek

Vivek Prahlad

unread,
Dec 9, 2011, 4:16:02 AM12/9/11
to neo...@googlegroups.com
Just saw that Andreas beat me to it :) - the second approach below is the same as what he's suggested.

Vivek

Peter Neubauer

unread,
Dec 9, 2011, 4:41:18 AM12/9/11
to neo...@googlegroups.com
Yes,
the ImpermanentGDB supports lucene and everything - the memory backing
is implemented at the java.nio.filechannel level. VERY convenient :)

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

brew install neo4j && neo4j start
heroku addons:add neo4j

Dmytrii Nagirniak

unread,
Dec 9, 2011, 4:45:06 AM12/9/11
to neo...@googlegroups.com
On 09/12/2011, at 8:00 PM, Peter Neubauer wrote:

> Guys, there is the new ImpermanentGraphDatabase, that is fully
> functional but operating on non-persistent RAM-backed FileChannels at
> the lowest level. We use it for testing in Java land, I think it would
> be very applicable here, too!
>

That is exactly the perfect solution that I'm after.
In RDBMS world I use in-memory SQLite for that.

So we can just drop the database and create a new one before the test and it should be cheap and fast.
Sounds awesome.

But I'm a little bit lost how I can make use of it (sorry just starting with all this), especially with the issues outlined by Andreas.

Or maybe for now I need to clean it up as Vivek explained?

BTW, I would call the class Transient/Memory Database instead of Impermanent :-)

Cheers.

Andreas Ronge

unread,
Dec 9, 2011, 5:00:22 AM12/9/11
to neo...@googlegroups.com
Created an issue for that.

http://neo4j.lighthouseapp.com/projects/15548-neo4j/tickets/208-better-cleanup-after-rspecs-using-impermanentgraphdatabase

@Dmytrii - I'm thinking of moving the lighthouse issues to github,
since it looks like github now has support for crosslinking between
commits and issues and support for hash tags in commit messages to
open close issues etc..

Dmytrii Nagirniak

unread,
Dec 9, 2011, 5:53:06 AM12/9/11
to neo...@googlegroups.com
On 09/12/2011, at 9:00 PM, Andreas Ronge wrote:

Thanks for that.
Hope it won't take too much time to implement it.

Unfortunately I won't be able to contribute PRs.
Really have to switch the current app over. And that's the last chance I am giving to neo4j :)


> @Dmytrii - I'm thinking of moving the lighthouse issues to github,
> since it looks like github now has support for crosslinking between
> commits and issues and support for hash tags in commit messages to
> open close issues etc..

Way to go! But I still can't see the "Issues" section on the repo.
Also with that in mind you could have created the "cleanup specs" issue on the Github.

Cheers.

Peter Neubauer

unread,
Dec 9, 2011, 6:03:15 AM12/9/11
to neo...@googlegroups.com

The only thing I am missing now is commit logs referring to other repos issues, and voting on issues. Otherwise, github is great.

/peter

Sent from my phone, please excuse typos and autocorrection.

Dmytrii Nagirniak

unread,
Dec 9, 2011, 6:20:18 AM12/9/11
to neo...@googlegroups.com
On 09/12/2011, at 10:03 PM, Peter Neubauer wrote:

The only thing I am missing now is commit logs referring to other repos issues, and voting on issues. Otherwise, github is great.

The voting is a bit controversial since Github REMOVED it :)
I guess it should be an issue with "+1 comments" and the number of currently existing issues.
Tags/milestones should also help with that.

As for referencing issues to other repos. I think you can do it: https://github.com/blog/967-github-secrets

Quote:
You can reference issues between repositories by mentioning user/repository#number in an issue.

Anyway, so what do I do with the spec cleanups for now?

Peter Neubauer

unread,
Dec 9, 2011, 6:32:20 AM12/9/11
to neo...@googlegroups.com

Wow thanks,
That is a life saver!

/peter

Sent from my phone, please excuse typos and autocorrection.

--

Vivek Prahlad

unread,
Dec 9, 2011, 6:33:05 AM12/9/11
to neo...@googlegroups.com
For now, I'd suggest the approach both Andreas and me suggested. Getting in the ImpermanentGraphDatabase will need a small amount of work, I think it should be fairly easy to switch to it once it's there.

Thanks,
Vivek

--

Dmytrii Nagirniak

unread,
Dec 9, 2011, 6:57:53 AM12/9/11
to neo...@googlegroups.com
Thank,

Done that. It works fairly well.
Passed my first set of specs :)

The first little win :)

Cheers.

Dmytrii Nagirniak

unread,
Dec 13, 2011, 8:43:02 PM12/13/11
to neo...@googlegroups.com
Another option would be to use memory disk for testing.
That should significantly increase the speed without any additional changes.

Vivek Prahlad

unread,
Dec 13, 2011, 11:09:47 PM12/13/11
to neo...@googlegroups.com
Yes, you're right. I actually forgot to mention that - we're using memory disks on both the linux and mac platforms on my project and it does significantly speed up the tests.

Vivek

Dmytrii Nagirniak

unread,
Dec 14, 2011, 12:06:26 AM12/14/11
to neo...@googlegroups.com
On 14/12/2011, at 3:09 PM, Vivek Prahlad wrote:

> Yes, you're right. I actually forgot to mention that - we're using memory disks on both the linux and mac platforms on my project and it does significantly speed up the tests.

Well done!

I just can't get my hands on to that. Also want to avoid any huge setups and create the memory disk before all specs and the drop it at the end.

Do you mind to share your setup?

Vivek Prahlad

unread,
Dec 14, 2011, 12:45:50 AM12/14/11
to neo...@googlegroups.com
Sure:

Linux
You can create a RAM disk with 500MB like this: (the mount command needs to be issued as root)
mkdir -p /tmp/neo4j_testing
mount -t tmpfs -o size=500M tmpfs /tmp/neo4j_testing

Mac
You can create a 550MB RAM disk like this:
diskutil erasevolume HFS+ "ramdisk" `hdiutil attach -nomount ram://1165430`

This will create a RAM disk under /Volumes/ramdisk

For both of these, in your spec_helper, you'll have to change the Neo4j config to use the RAM disk.

Neo4j::Config[:storage_path] = /path/to/ram/disk

You can use umount to unmount the ram disk on both platforms.

Cheers,
Vivek

Dmytrii Nagirniak

unread,
Dec 14, 2011, 1:47:16 AM12/14/11
to neo...@googlegroups.com
Thanks for that.

I guess I was curious if you mount the disk before each run and unmount at the end?
Or maybe you rely on a system being configured?.


Dmytrii Nagirniak

unread,
Dec 14, 2011, 11:41:06 PM12/14/11
to neo...@googlegroups.com
Unfortunately this approach doesn't work for Cucumber testing :(

It preserves indexes between the scenarios.
So every time I do "User.all" it returns the same objects that were created in the very first scenario!


How can I really make sure the DB is clear?

Here is my cleanup in Cucumber:

require 'fileutils'

# TODO: Remove dup: copy paste from the spec/support/neo4j.rb
def rm_db_storage!
  FileUtils.rm_rf Neo4j::Config[:storage_path]
  raise "Can't delete db" if File.exist?(Neo4j::Config[:storage_path])
end

rm_db_storage! unless Neo4j.running?

$spec_counter = Time.now.to_f
After do |s|
  Neo4j::Rails::Model.close_lucene_connections
  Neo4j::Transaction.run do
    Neo4j::Index::IndexerRegistry.delete_all_indexes
  end
  Neo4j::Transaction.run do
    Neo4j.threadlocal_ref_node = Neo4j::Node.new :name => "ref_#{$spec_counter}"
    $spec_counter += 1
  end
end





On 09/12/2011, at 10:57 PM, Dmytrii Nagirniak wrote:

Dmytrii Nagirniak

unread,
Dec 15, 2011, 10:13:10 PM12/15/11
to neo...@googlegroups.com
Anybody??


Vivek Prahlad

unread,
Dec 15, 2011, 11:06:14 PM12/15/11
to neo...@googlegroups.com
You may need to do a User.destroy_all after each test as a workaround. Otherwise, you'll have to delete all the nodes in your database like I'd suggested in an earlier thread. The 'all' call doesn't use the lucene index AFAIK, it uses traversals.

Is your user model shared across tenants (ie, does it have a

ref_node {Neo4j.default_ref_node}


declaration anywhere?

Vivek

Dmytrii Nagirniak

unread,
Dec 15, 2011, 11:21:18 PM12/15/11
to neo...@googlegroups.com
On 16/12/2011, at 3:06 PM, Vivek Prahlad wrote:

You may need to do a User.destroy_all after each test as a workaround. Otherwise, you'll have to delete all the nodes in your database like I'd suggested in an earlier thread. The 'all' call doesn't use the lucene index AFAIK, it uses traversals.

I am kind of lost here.
At the end of each spec I swap out ref_node which loses all the references to existing nodes.
So "User.all" should not return anything, especially if it's using traversals.
I don't understand how come it still does?



Is your user model shared across tenants (ie, does it have a

ref_node {Neo4j.default_ref_node}


declaration anywhere?

Nope.

Vivek Prahlad

unread,
Dec 15, 2011, 11:46:34 PM12/15/11
to neo...@googlegroups.com
Hard to say what exactly is going on without examining what's happening at runtime. I've seen this kind of behaviour when top level transactions are inadvertently created in tests / migrations. Is any of your before / after stuff creating Neo4j transactions anywhere?

In your test, you could try asserting that the Neo4j threadlocal ref node is the same as what was assigned in the before / after block? Are your user objects created by the test, or as part of setup?

Could you try looking at the database using Neoclipse? You should not see any user nodes attached to the home node.

Hope this helps,

Vivek

Dmytrii Nagirniak

unread,
Dec 16, 2011, 2:09:27 AM12/16/11
to neo...@googlegroups.com

On 16/12/2011, at 3:46 PM, Vivek Prahlad wrote:

> Hard to say what exactly is going on without examining what's happening at runtime. I've seen this kind of behaviour when top level transactions are inadvertently created in tests / migrations. Is any of your before / after stuff creating Neo4j transactions anywhere?

Yes, it's basically done the same is neo4j.rb does

The problem is that it work fine for RSpec, but not for Cucumber!

Anyway, here is my "features/support/ne4j.rb" file that is supposed to take care of the clean-up:

require 'fileutils'

# TODO: Remove dup: copy paste from the spec/support/neo4j.rb
def rm_db_storage!
FileUtils.rm_rf Neo4j::Config[:storage_path]
raise "Can't delete db" if File.exist?(Neo4j::Config[:storage_path])
end

rm_db_storage! unless Neo4j.running?

$spec_counter = Time.now.to_f
After do |s|
Neo4j::Rails::Model.close_lucene_connections
Neo4j::Transaction.run do
Neo4j::Index::IndexerRegistry.delete_all_indexes
end
Neo4j::Transaction.run do
Neo4j.threadlocal_ref_node = Neo4j::Node.new :name => "ref_#{$spec_counter}"
$spec_counter += 1
end
end

> In your test, you could try asserting that the Neo4j threadlocal ref node is the same as what was assigned in the before / after block?

ref_node is the same before tests, during tests and after. It only changes after the block above.


> Are your user objects created by the test, or as part of setup?

As part of the test. In Cucumber there is really no setup as such.


> Could you try looking at the database using Neoclipse? You should not see any user nodes attached to the home node.


That is not correct. The setup I showed you deletes the database in the beginning. At the end of each test the ref_node is replaced,

So the data will be left in the DB (and I can see it with Neoclipse).


Dmytrii Nagirniak

unread,
Dec 20, 2011, 11:34:17 PM12/20/11
to neo...@googlegroups.com
Anybody?


Vivek Prahlad

unread,
Dec 20, 2011, 11:57:37 PM12/20/11
to neo...@googlegroups.com
Not sure about what Cucumber is doing differently from RSpec. In the mean time, could you try out using this technique after each test:

       Neo4j::Transaction.run do

        Neo4j._all_nodes.each { |n| n.del unless n.neo_id == 0 }
      end

Thanks,
Vivek

Dmytrii Nagirniak

unread,
Jan 2, 2012, 5:19:35 PM1/2/12
to neo...@googlegroups.com
Hi Vivek,

Sorry for the late reply. Holidays :)

Yes, the manual deletion seems to work properly.
Not sure why the the threadloac_ref_node doesn't.

Any ideas? Better (and faster) ways to clean-up?

Cheers.

Dmytrii Nagirniak

unread,
Jan 6, 2012, 9:21:59 PM1/6/12
to neo...@googlegroups.com
I have tried to run the neo4j on a ramdisk.

I was very surprised.

On a HDD specs took 57s, on a RAM disk - 42.
That's only 30% improvement.

I suspect that most of the time is spent cleaning up the database deleting all the nodes.

Not sure how to profile well in JRuby.

Curious what are your thoughts on it guys?

Cheers.

Michael Hunger

unread,
Jan 7, 2012, 12:05:07 PM1/7/12
to neo...@googlegroups.com
How big is your datastore?

Also if you manually delete all nodes it still has to write tx logs, and also to clean out all the memory-mapped files (as well as the lucene index updates (deletions) behind the scenes).

It is not much faster b/c the disk based memory-mapped file implementation is already pretty fast.

Better to use ImpermanentGraphDatabase for testing, which is no longer using disk filesystems as of 1.6.M02.

Perhaps adding support for that in neo4j.rb would be great!

You can profile JRuby with any java profiler like (visualvm or yourkit) they can connect to the running java process and record timing information for all method calls. (Then you can have a look at the "hotspot" methods that take the most time).

Cheers

Michael

Andreas Ronge

unread,
Jan 7, 2012, 2:03:57 PM1/7/12
to neo...@googlegroups.com
Regarding slow RSpecs for Neo4j.rb - I've been thinking of splitting
neo4j into several gems.
Since yesterday Neo4j.rb now consists of two gems neo4j and neo4j-jars.
Splitting neo4j.rb into another gem (neo4j-rails) would mean less
RSpecs has to be run. (But we still can do that now by just bundle
exec spec spec/rails.) Also, I know there are some RSpecs that are
very similar which can be removed.

Another thing that is slow is loading the neo4j gem. Currently
everything is loaded (except some JAR files like for HA support).
Would be nice if we instead could use the Ruby autoload feature and
only load modules that are required.

Using ImpermanentGraphDatabase sounds like a good idea. I'm just a bit
worried that the real database and the ImpermanentGraphDatabase
behaves differently and would cause us problems (e.g. bugs only found
using the real database). One possible approach is to let the build
server use the real database and on the developer machine use the
ImpermanentGraphDatabase, maybe ?


Cheers
Andreas

Peter Neubauer

unread,
Jan 7, 2012, 2:30:56 PM1/7/12
to neo...@googlegroups.com

Guys,
the impermanent gdb is save to use, the abstraction is way down at the file channel layer and its used everywhere except the kernel tests.

Cheers

/peter

Sent from a device with crappy keyboard and autocorrect

Dmytrii Nagirniak

unread,
Jan 7, 2012, 8:31:39 PM1/7/12
to neo...@googlegroups.com
On 08/01/2012, at 4:05 AM, Michael Hunger wrote:

How big is your datastore?

The RAM disk size is 596 MB.
The amount of data created during specs will never ever reach that size.
Why?


Better to use ImpermanentGraphDatabase for testing, which is no longer using disk filesystems as of 1.6.M02.

Perhaps adding support for that in neo4j.rb would be great!

I'm not sure that it'll improve much since I'm already using memory (implicitly though).

You can profile JRuby with any java profiler like (visualvm or yourkit) they can connect to the running java process and record timing information for all method calls. (Then you can have a look at the "hotspot" methods that take the most time).

Unfortunately I can't make any sense of all that:

(also see the full graph profile) in comments.

Not sure if there are any tools to maybe visualise that.

Dmytrii Nagirniak

unread,
Jan 7, 2012, 8:41:00 PM1/7/12
to neo...@googlegroups.com
On 08/01/2012, at 6:03 AM, Andreas Ronge wrote:

Regarding slow RSpecs for Neo4j.rb - I've been thinking of splitting
neo4j into several gems.
Since yesterday Neo4j.rb now consists of two gems neo4j and neo4j-jars.
Splitting neo4j.rb into another gem (neo4j-rails) would mean less
RSpecs has to be run. (But we still can do that now by just bundle
exec spec spec/rails.) Also, I know there are some RSpecs that are
very similar which can be removed.

That makes sense, but in a Rails application that uses neo4j.rb (my case) it doesn't matter, all of that should still be loaded.


Another thing that is slow is loading the neo4j gem. Currently
everything is loaded (except some JAR files like for HA support).
Would be nice if we instead could use the Ruby autoload feature and
only load modules that are required.

I think Matz said that Ruby autoload is going to be deprecated far in the future and recommended against using it.
Apart from that, I'm not exactly sure where the bottleneck is. I just suspect that it's clean-up time.
See: https://gist.github.com/2bea805664bfb6491d1d and the link to full profile in the comments.


Using ImpermanentGraphDatabase sounds like a good idea. I'm just a bit
worried that the real database and the ImpermanentGraphDatabase
behaves differently and would cause us problems (e.g. bugs only found
using the real database). One possible approach is to let the build
server use the real database and on the developer machine use the
ImpermanentGraphDatabase, maybe ?

I 100% agree with you here.
Locally it's much better to use Impermanent gdb. On CI just add bother build config.
It's extremely easy with Travis.CI.
We could just add an ENV option that would default to impermanent gdb.

But honestly if impermanent gdb perf is on par with ramdisk, then it's not worth it.
Needs benchmarking first.

Dmytrii Nagirniak

unread,
Jan 8, 2012, 2:59:11 AM1/8/12
to neo...@googlegroups.com
On 08/01/2012, at 6:30 AM, Peter Neubauer wrote:


the impermanent gdb is save to use, the abstraction is way down at the file channel layer and its used everywhere except the kernel tests.


So with impermanent gdb, to clean-up the data before each test I would just drop the db and fire up a new on.
Would it be faster than removing <= 100 nodes and all indexes?

Michael Hunger

unread,
Jan 8, 2012, 7:00:37 AM1/8/12
to neo...@googlegroups.com
Looking at your profiling, it seems that not the cleanup takes the time but rather the startup process of your rspecs?

Starting a new clean db on a in memory FS is probably faster. B/C it doesn't have to do: tx, tx-log-files, db-files, shutdown.

After all measuring it is the only way to be certain.

Michael

Dmytrii Nagirniak

unread,
Jan 8, 2012, 7:24:15 PM1/8/12
to neo...@googlegroups.com

On 08/01/2012, at 11:00 PM, Michael Hunger wrote:

> Looking at your profiling, it seems that not the cleanup takes the time but rather the startup process of your specs?

Yeah, it looks like a lot of time is spent in Bundler loading things.
Doesn't look like neo4j is causing the slowdown at all.

I guess it just the way JRuby works :(

But still ~2 tests per second is just way too slow.

Michael Hunger

unread,
Jan 8, 2012, 8:32:54 PM1/8/12
to neo...@googlegroups.com
Perhaps you'd want to look into the stuff Corey Haines did about speeding up rspec tests?

Don't know if that is relevant to your case: http://moretea.posterous.com/corey-haines-fast-rails-tests

Otherwise what about setting up jruby/neo4j just once before all tests run and then cleanup in between, so you don't get the startup penalty for each spec?

Cheers

Michael

Dmytrii Nagirniak

unread,
Jan 8, 2012, 8:52:31 PM1/8/12
to neo...@googlegroups.com

On 09/01/2012, at 12:32 PM, Michael Hunger wrote:

> Perhaps you'd want to look into the stuff Corey Haines did about speeding up rspec tests?
>
> Don't know if that is relevant to your case: http://moretea.posterous.com/corey-haines-fast-rails-tests

It is possible but too hard as Rails has lots of dependencies.
I can consider that, but it will only speed up the model test. I still need full Rails stack of controller, view, helper spec.

> Otherwise what about setting up jruby/neo4j just once before all tests run and then cleanup in between, so you don't get the startup penalty for each spec?

That's exactly what I do. The database is set-up before all tests start. Then all nodes and indexes are removed in between.

Reply all
Reply to author
Forward
0 new messages