Document IDs with encoded Url break Studio

100 views
Skip to first unread message

Johannes Rudolph

unread,
Jan 22, 2013, 12:24:05 PM1/22/13
to rav...@googlegroups.com
I'm currently using RavenDb for an experimental project that does some social network crawling. All crawled resources end up as documents in the Db and have an Id of the form Collection/HttpUtility.Encode(URL). 

It seems the RavenDb Studio has issues displaying these documents. For a sample, please see the attached database export. The studio displays the error "Could not find 'document'" without any further info. 
In the logs, I can see two queries beeing issued:

Debug 22.01.2013 17:20:41 Document with key 'XingAccessTokens/https://api.xing.com/v1users/13885575_04ba70' was not found Raven.Storage.Esent.StorageActions.DocumentStorageActions
Debug 22.01.2013 17:20:41 Document with key 'xingaccesstokens/https%3a%2f%2fapi.xing.com%2fv1users%2f13885575_04ba70' was found Raven.Storage.Esent.StorageActions.DocumentStorageActions

The application code has no issues with these Ids, it seems that just the studio can't deal with them.

Regards,
Johannes

Matt Johnson

unread,
Jan 22, 2013, 12:57:38 PM1/22/13
to rav...@googlegroups.com
IMHO - Urls are a bad choice of document keys.

Johannes Rudolph

unread,
Jan 22, 2013, 1:34:21 PM1/22/13
to rav...@googlegroups.com
In this case they're quite handy since they define the identity of a resource. And it allows for efficient querying/replacing document by ID. 

And it's not an excuse for a bug in the studio. 

Matt Johnson

unread,
Jan 22, 2013, 1:41:02 PM1/22/13
to rav...@googlegroups.com, jojo.r...@googlemail.com
Perhaps, but do you really consider these separate keys?


Also, any key submitted to raven that ends in / will be appended with a sequential integer, so when you write http://foo.com/bar/ it's going to get stored as http://foo.com/bar/1 and the next time you save the same item you'll get http://foo.com/bar/2 - etc.  This behavior is very useful in bundles, and is documented (although quite buried) here: http://ravendb.net/docs/client-api/basic-operations/saving-new-document under "Document ID Generation Strategies"

I do agree about the studio not blowing up.

casperOne

unread,
Jan 22, 2013, 3:19:59 PM1/22/13
to rav...@googlegroups.com, jojo.r...@googlemail.com
I had this problem before as well.  I used URL encoded strings (so I wouldn't have issues with the slashes) but the studio still gives you issues.

That said, for RESTful urls, they are *perfect* for IDs (in a general sense), it's just that they aren't perfect from a RavenDB perspective because there's an assumption about the structure of the ID property.

I ended up creating a separate identifier constructed on a per-domain basis.  A PITA, but it works.

Johannes Rudolph

unread,
Feb 2, 2013, 5:12:06 AM2/2/13
to rav...@googlegroups.com, jojo.r...@googlemail.com
Sorry for getting out the loop, got stuck with something else.

There are very valid scenarios where you want to store an URL in the ID field, just consider the case where I'm trying to store OpenId claimed Id's in my database and I want to enforce uniqueness among them (so no Id gets associated with multiple accounts). I'd therefore MUST store the OpenId URLs as ID for RavenDb because there's no other way to guarantee uniquess if not with an ID index (thats what the docs say).

I tried encoded as well as unencoded URls, but both fail. What's _really_ problematic is the fact that its not only the Studio but also RavenDb client that fails - but only when used over http. When I ran the tests with the InMemoryStore I did not detect these errors. Here's a Repro, which requires running against an HTTP DocumentStore:

public class Repro
{
    public class Login
    {
        public const string DocumentTypePrefix = "logins/";

        public string Id { get; private set; }

        private Login() {}

        public static Login Create( string claimedId )
        {
            // Both of these variants fail, make sure to clear the database before running this test though!
            //string encodedIdentifier = System.Web.HttpUtility.UrlEncode( claimedId );
            string encodedIdentifier = claimedId;
            
            return new Login()
            {
                Id = Login.DocumentTypePrefix + encodedIdentifier
            };
        }
    }

    [Fact]
    public void CanSaveAndRetrieveTestOpenId()
    {
        DocumentStore store = new DocumentStore() { Url = "http://localhost:8080" };
        store.Initialize();

        string id = default( string );
        using (var session = store.OpenSession())
        {
            var login = Login.Create( "https://me.yahoo.com/a/rlvVJyIHwuykvNWYWOrE_Uv3Jt_d#c2458" );
            session.Store( login );
            session.SaveChanges();

            id = login.Id;
        }

        using (var session = store.OpenSession())
        {
            var doc = session.Load<Login>( id );
            Assert.NotNull( doc );
        }
    }
}

Since I can't store URLs as Ids (either encoded or unencoded), the question now is how to properly store my OpenIds and still get a proper uniqueness constraint:
a) I could store a secure hash of the URL as the ID (although this has potential problems, see http://stackoverflow.com/questions/5371076/should-i-store-openid-claimed-id-encrypted)
b) Store the ID in some other encoding (thinking of HEX as B64 also contains /)
c) Any other good suggestions?

Regards,
Johannes

Oren Eini (Ayende Rahien)

unread,
Feb 2, 2013, 9:44:09 AM2/2/13
to ravendb, Johannes Rudolph
Thanks for the repro, fixed and will be in the next build.


--
You received this message because you are subscribed to the Google Groups "ravendb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Matt Johnson

unread,
Feb 12, 2013, 10:27:12 AM2/12/13
to rav...@googlegroups.com, jojo.r...@googlemail.com
Johannes - Please see also this post: http://stackoverflow.com/a/14835856/634824

Uri.EscapeDataString may be appropriate for URIs where you don't risk collisions due to case insensitivity or other parameters - such as your OpenID claim.
Reply all
Reply to author
Forward
0 new messages