Delete not working on Sandbox

53 views
Skip to first unread message

finke...@gmail.com

unread,
May 16, 2013, 1:01:24 PM5/16/13
to learnin...@googlegroups.com
I've already mentioned this to Jim but I also wanted to put it as a thread to see if anyone else is having the issue, if we are missing something and if a fix is in progress

When trying to delete a document, instead of deleting the document the node just creates a new document , with the replaces property present, "payload_placement": "none" and a new document id, So during a slice instead of getting 0 records back, which is what I expected I get two records

Here is the slice after creating a document and then trying to delete it
http://sandbox.learningregistry.org/slice?identity=DaveFinkeTest3

This is happening for all of our NSDL records when trying to test delete and update but this test above was done using test script found at https://gist.github.com/jimklo/5484651 for record simplicity

is anyone else seeing this issue?

thanks
Dave Finke - NSDL Developer



Jim Klo

unread,
May 16, 2013, 1:30:38 PM5/16/13
to <learningreg-dev@googlegroups.com>
Hi Dave… I'm beginning to wonder if this is connected to the LR Signature issue you brought up earlier… as oddly, I'm not able to reproduce this… and the only thing I can think of that would cause this to happen is possibly a bad signature… 

Also… I last week I saw a very strange problem with the built in node-signing, which seemed to be tied to specific secret keys/tokens in the OAuth ceremony.  For some reason some secrets cause the signing to fail - no known reason at this point.  But changing the secrets seemed to solve.

Can you provide more details on the issue? I've created an issue here: https://github.com/LearningRegistry/LearningRegistry/issues/254

Things that would help are sample data, type of OpenPGP key you are using, etc..  Basically… anything that you are changing in the sample script https://gist.github.com/jimklo/5484651. If there is sensitive information you wish to provide, please contact myself directly.

- Jim

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac

On May 16, 2013, at 10:01 AM, <finke...@gmail.com>
 wrote:

--
You received this message because you are subscribed to the Google Groups "Learning Registry Developers List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningreg-d...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

finke...@gmail.com

unread,
May 16, 2013, 6:22:47 PM5/16/13
to learnin...@googlegroups.com
The script I'm using to create these documents is your exact exact script, The only thing i had to change was the location of
gpgbin, and the publicKeyLocations. 

Just now. i created a new key, deployed the public key to one of our servers instead of https://keyserver2.pgp.com/ just to be sure, then ran your script and JSON, changing only the identities of course, so i could search for them and same thing happened

I didn't even realize that the issue might be the signing of it. I just assumed that if the signature wasn't correct it would throw an error on publishing. I created my key via cygwin using

gpg (GnuPG) 1.4.13

for the new key I created and tried the script out for

using list keys it shows
pub   2048R/4425F43C 2013-05-16
uid                  Dave Finke <whirly...@gmail.com>
sub   2048R/B1B56EA0 2013-05-16

public key hosted at
http://ns.nsdldev.org/configs/harvest_ingest/public-key.txt

Using your LRSignature module version 0.1.12

I can send the the private key if you think it will help, these are our testing keys and will never be used outside of sandbox.

If you think its something with using cygwin I can try to deploy our code out to a unix box and try it out there, ecpecially if no one else is having these issues maybe something is wrong with the system I'm running it on.

thanks
Dave

Jim Klo

unread,
May 16, 2013, 6:57:50 PM5/16/13
to <learningreg-dev@googlegroups.com>

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac

On May 16, 2013, at 3:22 PM, <finke...@gmail.com>
 wrote:

The script I'm using to create these documents is your exact exact script, The only thing i had to change was the location of
gpgbin, and the publicKeyLocations. 

Just now. i created a new key, deployed the public key to one of our servers instead of https://keyserver2.pgp.com/ just to be sure, then ran your script and JSON, changing only the identities of course, so i could search for them and same thing happened


DSA or RSA key and size?… I suppose I can fetch your key and find out… 


I didn't even realize that the issue might be the signing of it. I just assumed that if the signature wasn't correct it would throw an error on publishing. I created my key via cygwin using


Yes, there's some weirdness that goes on with the signing that I'm not 100% sure it chooses to sign and not when to sign.  Excellent feature request for LR Signature!


gpg (GnuPG) 1.4.13

for the new key I created and tried the script out for

using list keys it shows
pub   2048R/4425F43C 2013-05-16
uid                  Dave Finke <whirly...@gmail.com>
sub   2048R/B1B56EA0 2013-05-16

public key hosted at
http://ns.nsdldev.org/configs/harvest_ingest/public-key.txt

Using your LRSignature module version 0.1.12


Okay… latest is 0.1.13, but really not much difference; especially for this.

I can send the the private key if you think it will help, these are our testing keys and will never be used outside of sandbox.

If you think its something with using cygwin I can try to deploy our code out to a unix box and try it out there, ecpecially if no one else is having these issues maybe something is wrong with the system I'm running it on.


Hmm… I've not tried this on cygwin… theoretically should be the same… I can possibly try this as well…  If it's easy enough for you to try this on *nix please try and let us know the verdict… I can try on a cygwin environment as well..

- JK

finke...@gmail.com

unread,
May 16, 2013, 7:05:19 PM5/16/13
to learnin...@googlegroups.com
Key type - RSA and RSA, (Its the default)
Size - 2048 - (Its the default)
Expiration - Never expires (its the default)

No problem at all trying it on a unix box, I'll start deploying the script out to our Dev unix box right now.

thanks for all your help.
-Dave

finke...@gmail.com

unread,
Jun 24, 2013, 1:43:12 PM6/24/13
to learnin...@googlegroups.com
After playing around with this for awhile. I finally got update and delete to work on sandbox, just wanted to post what I had to do to make it work for others

I ended up having to use http://pool.sks-keyservers.net as the key server to serve my public key. Serving it using https://keyserver2.pgp.com just wasn't working. Its hard to tell what was wrong since it seems like sandbox accepts all documents regardless what the public key URL is. It only matters when you try to delete or update.

Also

Jim

Update works perfectly.
Delete, tombstones the old documents correctly but then creates a new document that is blank and contains the replaces. This happens using your test_delete script as well as our code, Its not a huge deal because I just check for "payload_placement": "none", The only way to cheat it is if you leave the original doc_id in there when you submit the delete. This tombstones the original document then causes a couchdb error when it tries to create the blank new document. Thus not creating one.


Dave - NSDL

Jim Klo

unread,
Jun 24, 2013, 9:50:50 PM6/24/13
to <learningreg-dev@googlegroups.com>, learnin...@googlegroups.com


Sent from my iPhone

On Jun 24, 2013, at 12:43 PM, "finke...@gmail.com" <finke...@gmail.com> wrote:

After playing around with this for awhile. I finally got update and delete to work on sandbox, just wanted to post what I had to do to make it work for others

I ended up having to use http://pool.sks-keyservers.net as the key server to serve my public key. Serving it using https://keyserver2.pgp.com just wasn't working. Its hard to tell what was wrong since it seems like sandbox accepts all documents regardless what the public key URL is. It only matters when you try to delete or update.


I'll have a look at the code to see what might be wrong. Is your key retrievable via http without authentication when using Symantec's PGP server (as opposed to Canonicals SKS)?



Also

Jim

Update works perfectly.
Delete, tombstones the old documents correctly but then creates a new document that is blank and contains the replaces. This happens using your test_delete script as well as our code, Its not a huge deal because I just check for "payload_placement": "none", The only way to cheat it is if you leave the original doc_id in there when you submit the delete. This tombstones the original document then causes a couchdb error when it tries to create the blank new document. Thus not creating one.


I'm not sure I'm totally following. The only difference between Update and Delete is that Delete allows payload_placement=none with a relaxation of some of the normally required properties.  

So delete and update is really same operation, just different payload.  Both are supposed to be the operation of new document replacing an old document. With delete, the new document is mostly empty (envelope without a payload). This is because its a federated operation. If you publish a delete on node a, the distribute process needs to push the empty replacement doc to nodes b, c, and d so the delete gets processed on those nodes too. 

So it seems like the error is actually the opposite of what you think should be happening. The fact that the old doc is getting tombstoned and a new delete document is not being created is the actual error. I believe this is a bit of a race condition. You're publishing a valid replacement document which generates a tombstone but using an invalid doc_ID with the replacement. 


Dave - NSDL

Steve Midgley

unread,
Jun 24, 2013, 10:47:39 PM6/24/13
to learnin...@googlegroups.com
Thanks Dave as always. To echo Jim's comment, I think it's really important that we create not just a tombstone for the original but we really want that second (newer) delete doc to exist (so your erroring out with the second doc is a bug and needs to get fixed - thanks for finding it). That is the doc that gets replicated, not the tombstone, so federated delete only works when the new doc is created (and federated).

Delete, as Jim says, is just an update without a payload.. Function is otherwise the same. Kind of wacky, but a federated datastore requires it I think.

If you think this doesn't sound right, please let us know. Jim, Walt and I examined this system around pretty intensively but we're just three guys with some ideas, so there always could be an undiscovered use case.

Steve

finke...@gmail.com

unread,
Jun 25, 2013, 6:10:36 PM6/25/13
to learnin...@googlegroups.com
Steve,
   When you guys explain it that way it makes sense why its there. I'm a convert.

Since we delete the documents after we retrieve from the LR, I'm just going to make sure that i blank out the keys and the resource_data before I publish the delete so the resource data will never be seen again. This will also keep the empty document from coming back in a slice if one queries on NSDL.

I'll also stop forcing couch db errors to make it work the way I expected it too.

thanks for explaining
Dave

Steve Midgley

unread,
Jun 25, 2013, 6:24:17 PM6/25/13
to learnin...@googlegroups.com
Slice shouldn't return deleted documents - if it does, that's a bug.. Can you send an example? It also shouldn't returned documents which have been updated with a newer version (in short it should never return tombstone documents, nor should any other similar API like obtain or OAI-PMH).

Thanks,
Steve



finke...@gmail.com

unread,
Jun 25, 2013, 6:39:40 PM6/25/13
to learnin...@googlegroups.com
Sure,
       example - http://sandbox.learningregistry.org/slice?any_tags=NSDL8_COLLECTION_ncs-NSDL-COLLECTION-000-003-112-049

I started with 6 documents then deleted them, which worked. These six all have "payload_placement": "none".

I forgot to remove their keys before I published the delete but the resource_data is gone

Steve Midgley

unread,
Jun 25, 2013, 6:55:29 PM6/25/13
to learnin...@googlegroups.com
Great thanks. Jim will have to clarify - my understanding was that tombstones shouldn't have keys or other envelope metadata either. So I'm either confused (very possibly) or something isn't working right in the delete system. If I had to guess I'm confused, but let's see what Jim and Walt say when they get back (unless this is pressing, in which case we can escalate). Let me know..

Best,
Steve



finke...@gmail.com

unread,
Jun 25, 2013, 7:15:48 PM6/25/13
to learnin...@googlegroups.com
From how I understand it there are actually 2 documents when a delete happens.

One the original one becomes a tombstone, I've seen it before briefly on slice when i used my error out couchdb cheat.
I noticed that it contained nothing. But without making couchdb error out I cannot view tombstones on a slice

Then there is also the replace document, that was created to distribute that the delete happened. That is what we are seeing in this slice.

Jim says earlier

"So it seems like the error is actually the opposite of what you think should be happening. The fact that the old doc is getting tombstoned and a new delete document is not being created is the actual error"

Which in my mind after reading your guys responses makes sense since if a person is doing a slice every week with a different from= they might want to know that the resource has been deleted so they should delete their's too. Which then brings up the question should the keys remain so any consumer that is re-harvesting via slice knows that it has been deleted. If we remove the keys on a delete, someone harvesting via slice will not see it as deleted.

This is not pressing for NSDL, since we ignore any record that has "payload_placement": "none" in it. I really just mentioned the slice to make sure I was on the same page as you and Jim.

thanks
Dave - NSDL

Steve Midgley

unread,
Jun 25, 2013, 7:19:19 PM6/25/13
to learnin...@googlegroups.com
Yeah - I can see that - good point. It's weird though b/c the new doc, in a true delete, shouldn't have any information in it at all.. So if the new doc has payload placement: none but has some keys in it, then it's kind of an update and kind of a delete. Probably ok to leave it alone, but I would think you would either publish new metadata AND keys and other envelope stuff, or you would just publish a barebones delete doc with no payload and the minimal info to delete the old envelope?

But your point about needing to get a feed somehow to see the deletions makes some sense since most people are indexing outside of LR (which is a good thing and should be encouraged!).. So any API that LR provides should include *new* delete/update envelopes, but not the tombstones..

Steve



Jim Klo

unread,
Jun 25, 2013, 9:49:59 PM6/25/13
to <learningreg-dev@googlegroups.com>, learnin...@googlegroups.com
I don't prohibit keys for a delete. They just don't necessarily make a lot of sense. Basically any field that was optional is still optional. It's just those fields that were required are now optional when payload is none. 

What would make better sense? A feed with only tombstones? Or a mixed feed with both?  We could pass a parameter to existing services such that it modifies what is returned. 

I'll look at the signing more closely later this week once I'm back in the office. 

- JK

Sent from my iPhone

Steve Midgley

unread,
Jun 25, 2013, 11:34:37 PM6/25/13
to learnin...@googlegroups.com
I don't think it's a problem to permit them, just that it's hard to see what sense there is a zero payload but descriptive keys about the payload.. Doesn't seem like worth messing with that..

I don't know if we should have a changes feed and a content feed? Changes feed would include deletions and updates, content feed would include updates and adds?

Dave, do you have an opinion on this? Would that make things easier?

Steve

finke...@gmail.com

unread,
Jun 26, 2013, 11:53:15 AM6/26/13
to learnin...@googlegroups.com
Because of other requirements in our system when we ingest, we do slice without dates and bring everything back and thus only use the latest date for a particular resource_locator. So to us it really doesn't matter since we don't really harvest by date instead we just wipe our index out and use fresh files.

That being said I think that asking people who harvest from the LR themselves by date should give their input. Like Walt who I think wrote the code that ingests records for the http://newfree.ed.gov/. I took at their code and it looks like they are ingesting and indexing the data by themselves by date ranges. So which ever method is easier for them to make sure that if a delete document is found while harvesting they delete that document from the index and same with a update. Hopefully other groups will follow suit and deal with updates and deletes.

This is why I'm thinking that anyone who does a delete should leave the keys and payload schema as it was before the delete.

For example

So lets say I create a document on 2013-6-1 and then delete it on 2013-6-30 and a group harvests every Monday for new documents

on June 3rd they run
/slice?any_tags=nsdl&from=2013-5-27 <- they will see the live document and add it to their index

then on July 1st they run
/slice?any_tags=nsdl&from=2013-6-24

If I don't leave the key nsdl in the delete document they will not see the deleted document, to them it will look like nothing has changed for that particular document. Not sure if other groups will be indexing using slice, but we do.

Dave





Steve Midgley

unread,
Jun 26, 2013, 1:51:32 PM6/26/13
to learnin...@googlegroups.com, Jerome Grimmer
That seems reasonable - thanks for the input. If anyone else has input hopefully we'll hear from them. Cc'ing Jerome specifically in case he can take a look and let us know how they're handling this..

And your point about leaving the envelope identical for the delete and original document makes a lot of sense in this regard..

Steve



Reply all
Reply to author
Forward
0 new messages