How best to clean out a database for testing when using py2neo?

609 views
Skip to first unread message

Alan Robertson

unread,
Jun 23, 2012, 6:23:24 PM6/23/12
to Nigel Small, Neo4J
Hi,

I have a set of regression tests that I'd like to run. Each test needs
to start with a clean database, so the results make sense.

So, in py2neo, what's the best way to clean out all the nodes,
relationships and indexes?

It is possible that some of the tests might have more of any of these
than would comfortably fit into RAM on a given test machine... Or the
test might have removed them all as part of running ;-)

But that's not the norm. Obviously for that case, I can just remove the
database. But for most of them, that's really slow and requires root
privileges.

Suggestions?

--
Alan Robertson<al...@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce

Michael Hunger

unread,
Jun 23, 2012, 10:24:28 PM6/23/12
to ne...@googlegroups.com, Nigel Small
Alan,

you might just run a cypher delete all command?

start n=node(*) match n-[r?]-() where id(n) <> 0 delete n,r

Michael

Alan Robertson

unread,
Jun 23, 2012, 10:43:06 PM6/23/12
to ne...@googlegroups.com
On 06/23/2012 08:24 PM, Michael Hunger wrote:
> Alan,
>
> you might just run a cypher delete all command?
>
> start n=node(*) match n-[r?]-() where id(n)<> 0 delete n,r

Very succinct. What about the indexes?

Alan Robertson

unread,
Jun 24, 2012, 12:11:57 AM6/24/12
to ne...@googlegroups.com
On 06/23/2012 08:43 PM, Alan Robertson wrote:
> On 06/23/2012 08:24 PM, Michael Hunger wrote:
>> Alan,
>>
>> you might just run a cypher delete all command?
>>
>> start n=node(*) match n-[r?]-() where id(n)<> 0 delete n,r
>
> Very succinct. What about the indexes?
>
Also, I see that it doesn't reuse ids. Is that expected? [I was
surprised given what I remember of earlier discussions].

[Life would be simpler if the tests were completely repeatable -
including ids].

Michael Hunger

unread,
Jun 24, 2012, 1:55:11 AM6/24/12
to ne...@googlegroups.com
Id reuse after restart
Indexes are deleted lazily

Sent from mobile device

Andres Taylor

unread,
Jun 24, 2012, 3:10:53 AM6/24/12
to ne...@googlegroups.com
On Sun, Jun 24, 2012 at 6:11 AM, Alan Robertson <al...@unix.sh> wrote:

[Life would be simpler if the tests were completely repeatable - including ids].

Node and relationship id's are an implementation detail. It's unfortunate that they are exposed at all, and you should not depend on them.

Andrés

Alan Robertson

unread,
Jun 24, 2012, 8:53:22 AM6/24/12
to ne...@googlegroups.com
But for running a test again and again manually to fix a bug, it's nice to have them consistent.  I just said life would be simpler - which is true...

Michael Hunger

unread,
Jun 24, 2012, 8:59:18 AM6/24/12
to ne...@googlegroups.com
You can index your nodes and then pull them from the index to be checked against?

Michael

Alan Robertson

unread,
Jun 24, 2012, 9:14:34 AM6/24/12
to ne...@googlegroups.com, Michael Hunger


On 6/24/2012 6:59 AM, Michael Hunger wrote:
> You can index your nodes and then pull them from the index to be
> checked against?
>
> Michael

Do you have anything comparably straightforward as the script you gave me?

> start n=node(*) match n-[r?]-() where id(n) <> 0 delete n,r
By the way, as I understand it, this doesn't delete relationships for node zero. So I ran this script first:

start n=node(0) match n-[r?]-() delete r

Does that look right to you? [and was it necessary?]



Alan Robertson

unread,
Jun 24, 2012, 9:26:35 AM6/24/12
to ne...@googlegroups.com

On 6/23/2012 11:55 PM, Michael Hunger wrote:
Id reuse after restart
Indexes are deleted lazily
A thought about that behavior from a long term perspective...

If you're in a high-availability configuration, the goal would be to never restart - ever.  Practically speaking, at least not until an upgrade of neo4J - which might be 5 or 10 years between.  "If it ain't broke, don't fix it".

I've certainly seen HA servers that were up for many years without interruption  - running happily along.  [I know this for sure because we had a few different sets of time representations overflow and cause problems - so it will happen]  FWIW industry numbers seem to indicate about a 5-year MTBF for intel-class servers.  10% of them won't fail for much longer than that.  And the way it works, you replace the broken ones, and your HA service migrates to the ones that have been running the longest - and they stay there...  This means to the best servers - those least likely to break again.  So, this is a good thing...

I'm currently working on a project that once turned over to the customer, they will be unable to upgrade it for the remaining 9 years of its planned life.

It could happen to you too ;-)

Not a problem for a few years yet, but something to think about...  [I suspect someone has thought about it]

    -- Alan Robertson
        al...@unix.sh

Michael Hunger

unread,
Jun 24, 2012, 10:22:29 AM6/24/12
to Alan Robertson, ne...@googlegroups.com
Unfortunately not as simple, what you'd do is to use the rest-api to index your nodes with a certain domain id and value
An in the test-assertion you'd lookup the node by index,key,value instead of id.

The lookup works with cypher too: start n=node:index(key=value) ...

As your relationship points to some other node (except for self-rels) it should come across the rel from the other node and delete it there.
At least as long as you omit the directional arrow.

Michael
Reply all
Reply to author
Forward
0 new messages