I'm attempting to implement an OpenID (plus Facebook and perhaps Twitter or other OAuth providers) login structure with RavenDB, very much like the Stack Exchange network.
For this I'm using a user document that boils down to:
But, I started to grow concerned about enforcing uniqueness on the logins.
I read this article about enforcing uniqueness using another document, which is also the basic framework behind how the unique constraint bundle works: http://old.ravendb.net/faq/unique-constraints
However, if Raven document ids are limited to 127 characters until 1.2, then that leaves me with a potential problem - with a key format of "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL wouldn't fit within 127 characters.
We do have two apps using OpenID and the largest identifier URLs currently top out at 97 characters - that's a Yahoo one. Google and Yahoo are the worst offenders. 97 leaves me 30 characters of prefix information in my document id, but I'm wary about counting on that. Even just for Yahoo, the URL length varies widely in my existing data, some as short as 35 characters. I would hate to bank on an assumed maximum URL length and then find my application unable to support OpenIDs from XYZ because their OpenID identifiers are too long.
So the way I see it I have 3 options:
1) Bury my head in the sand and don't worry too much about uniqueness. Use an index on the URLs and use .Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to minimize any problems. Take solace in the assumption that only one person should ever be able to use a given OpenID URL, so it would be pretty difficult to get into a concurrency problem where they're trying to create the same login in two places at once.
2) Use a unique constraint document, and check the length of the ID, truncating it to 127 characters if it is too long. This runs the risk of a collision - especially on those Google and Yahoo identifiers, I have no idea how the pseudorandom characters are generated.
3) A hash would be shorter than the full URL, but obviously it is risky to rely on that alone due to the possibility of hash collisions. Instead, I could use the full URL if it fits within 127 characters, and if it doesn't, use "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this provide enough guarantee of uniqueness?
I'm not sure which path to take. It seems like I have to look into a crystal ball to determine which of 3 very unlikely scenarios is the most unlikely and I don't have the expertise to decide.
Has anyone else implemented OpenID with Raven and can add something to my deliberations?
For reducing potential clashes with the 127 character limit, what if you strip out the authority information from the URL? (you could always store that part, or the full url in a separate property)
I see basically 0 chance of fidelity loss if you use " accounts/o8/id?id=xxxxxxxxxx<https://www.google.com/accounts/o8/id?id=xxxxxxxxxx>" as the Id. Suppose even that multiple providers use the exact same URL structure accounts/o8/id?id=, the querystring value id should still have an unique key at the end of it.
On Friday, May 4, 2012 10:17:41 AM UTC-4, David Boike wrote:
> I'm attempting to implement an OpenID (plus Facebook and perhaps Twitter > or other OAuth providers) login structure with RavenDB, very much like the > Stack Exchange network.
> For this I'm using a user document that boils down to:
> But, I started to grow concerned about enforcing uniqueness on the logins.
> I read this article about enforcing uniqueness using another document, > which is also the basic framework behind how the unique constraint bundle > works: http://old.ravendb.net/faq/unique-constraints
> However, if Raven document ids are limited to 127 characters until 1.2, > then that leaves me with a potential problem - with a key format of > "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL wouldn't fit > within 127 characters.
> We do have two apps using OpenID and the largest identifier URLs currently > top out at 97 characters - that's a Yahoo one. Google and Yahoo are the > worst offenders. 97 leaves me 30 characters of prefix information in my > document id, but I'm wary about counting on that. Even just for Yahoo, the > URL length varies widely in my existing data, some as short as 35 > characters. I would hate to bank on an assumed maximum URL length and then > find my application unable to support OpenIDs from XYZ because their OpenID > identifiers are too long.
> So the way I see it I have 3 options:
> 1) Bury my head in the sand and don't worry too much about uniqueness. Use > an index on the URLs and use .Customize(x => > x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to minimize > any problems. Take solace in the assumption that only one person should > ever be able to use a given OpenID URL, so it would be pretty difficult to > get into a concurrency problem where they're trying to create the same > login in two places at once.
> 2) Use a unique constraint document, and check the length of the ID, > truncating it to 127 characters if it is too long. This runs the risk of a > collision - especially on those Google and Yahoo identifiers, I have no > idea how the pseudorandom characters are generated.
> 3) A hash would be shorter than the full URL, but obviously it is risky to > rely on that alone due to the possibility of hash collisions. Instead, I > could use the full URL if it fits within 127 characters, and if it doesn't, > use "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this > provide enough guarantee of uniqueness?
> I'm not sure which path to take. It seems like I have to look into a > crystal ball to determine which of 3 very unlikely scenarios is the most > unlikely and I don't have the expertise to decide.
> Has anyone else implemented OpenID with Raven and can add something to my > deliberations?
> For reducing potential clashes with the 127 character limit, what if > you strip out the authority information from the URL? (you could > always store that part, or the full url in a separate property)
> I see basically 0 chance of fidelity loss if you use > "accounts/o8/id?id=xxxxxxxxxx > <https://www.google.com/accounts/o8/id?id=xxxxxxxxxx>" as the Id. > Suppose even that multiple providers use the exact same URL structure > accounts/o8/id?id=, the querystring value id should still have an > unique key at the end of it.
> On Friday, May 4, 2012 10:17:41 AM UTC-4, David Boike wrote:
> I'm attempting to implement an OpenID (plus Facebook and perhaps
> Twitter or other OAuth providers) login structure with RavenDB,
> very much like the Stack Exchange network.
> For this I'm using a user document that boils down to:
> However, if Raven document ids are limited to 127 characters until
> 1.2, then that leaves me with a potential problem - with a key
> format of "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID
> URL wouldn't fit within 127 characters.
> We do have two apps using OpenID and the largest identifier URLs
> currently top out at 97 characters - that's a Yahoo one. Google
> and Yahoo are the worst offenders. 97 leaves me 30 characters of
> prefix information in my document id, but I'm wary about counting
> on that. Even just for Yahoo, the URL length varies widely in my
> existing data, some as short as 35 characters. I would hate to
> bank on an assumed maximum URL length and then find my application
> unable to support OpenIDs from XYZ because their OpenID
> identifiers are too long.
> So the way I see it I have 3 options:
> 1) Bury my head in the sand and don't worry too much about
> uniqueness. Use an index on the URLs and use .Customize(x =>
> x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to
> minimize any problems. Take solace in the assumption that only one
> person should ever be able to use a given OpenID URL, so it would
> be pretty difficult to get into a concurrency problem where
> they're trying to create the same login in two places at once.
> 2) Use a unique constraint document, and check the length of the
> ID, truncating it to 127 characters if it is too long. This runs
> the risk of a collision - especially on those Google and Yahoo
> identifiers, I have no idea how the pseudorandom characters are
> generated.
> 3) A hash would be shorter than the full URL, but obviously it is
> risky to rely on that alone due to the possibility of hash
> collisions. Instead, I could use the full URL if it fits within
> 127 characters, and if it doesn't, use
> "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does
> this provide enough guarantee of uniqueness?
> I'm not sure which path to take. It seems like I have to look into
> a crystal ball to determine which of 3 very unlikely scenarios is
> the most unlikely and I don't have the expertise to decide.
> Has anyone else implemented OpenID with Raven and can add
> something to my deliberations?
Another option that would probably be always the easiest take 127 right most characters of the url, new string(new string(Url.Reverse()).Remove(127).Reverse) (you could do string math to avoid using reverse and just use substring)
> So then would you only strip out the authority if the overall URL length > was sufficiently large?
> For reducing potential clashes with the 127 character limit, what if you > strip out the authority information from the URL? (you could always store > that part, or the full url in a separate property)
> I see basically 0 chance of fidelity loss if you use " > accounts/o8/id?id=xxxxxxxxxx<https://www.google.com/accounts/o8/id?id=xxxxxxxxxx>" > as the Id. Suppose even that multiple providers use the exact same URL > structure accounts/o8/id?id=, the querystring value id should still have an > unique key at the end of it.
> On Friday, May 4, 2012 10:17:41 AM UTC-4, David Boike wrote:
>> I'm attempting to implement an OpenID (plus Facebook and perhaps Twitter >> or other OAuth providers) login structure with RavenDB, very much like the >> Stack Exchange network.
>> For this I'm using a user document that boils down to:
>> But, I started to grow concerned about enforcing uniqueness on the logins.
>> I read this article about enforcing uniqueness using another document, >> which is also the basic framework behind how the unique constraint bundle >> works: http://old.ravendb.net/faq/unique-constraints
>> However, if Raven document ids are limited to 127 characters until 1.2, >> then that leaves me with a potential problem - with a key format of >> "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL wouldn't fit >> within 127 characters.
>> We do have two apps using OpenID and the largest identifier URLs >> currently top out at 97 characters - that's a Yahoo one. Google and Yahoo >> are the worst offenders. 97 leaves me 30 characters of prefix information >> in my document id, but I'm wary about counting on that. Even just for >> Yahoo, the URL length varies widely in my existing data, some as short as >> 35 characters. I would hate to bank on an assumed maximum URL length and >> then find my application unable to support OpenIDs from XYZ because their >> OpenID identifiers are too long.
>> So the way I see it I have 3 options:
>> 1) Bury my head in the sand and don't worry too much about uniqueness. >> Use an index on the URLs and use .Customize(x => >> x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to minimize >> any problems. Take solace in the assumption that only one person should >> ever be able to use a given OpenID URL, so it would be pretty difficult to >> get into a concurrency problem where they're trying to create the same >> login in two places at once.
>> 2) Use a unique constraint document, and check the length of the ID, >> truncating it to 127 characters if it is too long. This runs the risk of a >> collision - especially on those Google and Yahoo identifiers, I have no >> idea how the pseudorandom characters are generated.
>> 3) A hash would be shorter than the full URL, but obviously it is risky >> to rely on that alone due to the possibility of hash collisions. Instead, I >> could use the full URL if it fits within 127 characters, and if it doesn't, >> use "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this >> provide enough guarantee of uniqueness?
>> I'm not sure which path to take. It seems like I have to look into a >> crystal ball to determine which of 3 very unlikely scenarios is the most >> unlikely and I don't have the expertise to decide.
>> Has anyone else implemented OpenID with Raven and can add something to my >> deliberations?
Wouldn't it be a bit more robust to hash the string, using something like SHA256. In the code below, the hashed string is 64 chars long, this is the hex representation of the hashed data.
//Create a long input string var input = String.Join("", Enumerable.Range(1, 1000).Select(x => x.ToString())); Byte[] inputBytes = Encoding.UTF8.GetBytes(input);
var algorithm = new SHA256CryptoServiceProvider(); Byte[] hashedBytes = algorithm.ComputeHash(inputBytes);
var result = BitConverter.ToString(hashedBytes).Replace("-", ""); var length = result.Length; //64
On Friday, 4 May 2012 16:39:45 UTC+1, Chris Marisic wrote:
> You could do something like that.
> Another option that would probably be always the easiest take 127 right > most characters of the url, new string(new > string(Url.Reverse()).Remove(127).Reverse) (you could do string math to > avoid using reverse and just use substring)
> On Friday, May 4, 2012 10:43:06 AM UTC-4, David Boike wrote:
>> That's an interesting idea. That would bring the worst Yahoo ID in my >> currently known data set down to 76 characters.
>> So then would you only strip out the authority if the overall URL length >> was sufficiently large?
>> For reducing potential clashes with the 127 character limit, what if you >> strip out the authority information from the URL? (you could always store >> that part, or the full url in a separate property)
>> I see basically 0 chance of fidelity loss if you use " >> accounts/o8/id?id=xxxxxxxxxx<https://www.google.com/accounts/o8/id?id=xxxxxxxxxx>" >> as the Id. Suppose even that multiple providers use the exact same URL >> structure accounts/o8/id?id=, the querystring value id should still have an >> unique key at the end of it.
>> On Friday, May 4, 2012 10:17:41 AM UTC-4, David Boike wrote:
>>> I'm attempting to implement an OpenID (plus Facebook and perhaps Twitter >>> or other OAuth providers) login structure with RavenDB, very much like the >>> Stack Exchange network.
>>> For this I'm using a user document that boils down to:
>>> But, I started to grow concerned about enforcing uniqueness on the >>> logins.
>>> I read this article about enforcing uniqueness using another document, >>> which is also the basic framework behind how the unique constraint bundle >>> works: http://old.ravendb.net/faq/unique-constraints
>>> However, if Raven document ids are limited to 127 characters until 1.2, >>> then that leaves me with a potential problem - with a key format of >>> "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL wouldn't fit >>> within 127 characters.
>>> We do have two apps using OpenID and the largest identifier URLs >>> currently top out at 97 characters - that's a Yahoo one. Google and Yahoo >>> are the worst offenders. 97 leaves me 30 characters of prefix information >>> in my document id, but I'm wary about counting on that. Even just for >>> Yahoo, the URL length varies widely in my existing data, some as short as >>> 35 characters. I would hate to bank on an assumed maximum URL length and >>> then find my application unable to support OpenIDs from XYZ because their >>> OpenID identifiers are too long.
>>> So the way I see it I have 3 options:
>>> 1) Bury my head in the sand and don't worry too much about uniqueness. >>> Use an index on the URLs and use .Customize(x => >>> x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to minimize >>> any problems. Take solace in the assumption that only one person should >>> ever be able to use a given OpenID URL, so it would be pretty difficult to >>> get into a concurrency problem where they're trying to create the same >>> login in two places at once.
>>> 2) Use a unique constraint document, and check the length of the ID, >>> truncating it to 127 characters if it is too long. This runs the risk of a >>> collision - especially on those Google and Yahoo identifiers, I have no >>> idea how the pseudorandom characters are generated.
>>> 3) A hash would be shorter than the full URL, but obviously it is risky >>> to rely on that alone due to the possibility of hash collisions. Instead, I >>> could use the full URL if it fits within 127 characters, and if it doesn't, >>> use "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this >>> provide enough guarantee of uniqueness?
>>> I'm not sure which path to take. It seems like I have to look into a >>> crystal ball to determine which of 3 very unlikely scenarios is the most >>> unlikely and I don't have the expertise to decide.
>>> Has anyone else implemented OpenID with Raven and can add something to >>> my deliberations?
On Fri, May 4, 2012 at 5:17 PM, David Boike <david.bo...@gmail.com> wrote:
> I'm attempting to implement an OpenID (plus Facebook and perhaps Twitter
> or other OAuth providers) login structure with RavenDB, very much like the
> Stack Exchange network.
> For this I'm using a user document that boils down to:
> But, I started to grow concerned about enforcing uniqueness on the logins.
> I read this article about enforcing uniqueness using another document,
> which is also the basic framework behind how the unique constraint bundle
> works: http://old.ravendb.net/faq/unique-constraints
> However, if Raven document ids are limited to 127 characters until 1.2,
> then that leaves me with a potential problem - with a key format of
> "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL wouldn't fit
> within 127 characters.
> We do have two apps using OpenID and the largest identifier URLs currently
> top out at 97 characters - that's a Yahoo one. Google and Yahoo are the
> worst offenders. 97 leaves me 30 characters of prefix information in my
> document id, but I'm wary about counting on that. Even just for Yahoo, the
> URL length varies widely in my existing data, some as short as 35
> characters. I would hate to bank on an assumed maximum URL length and then
> find my application unable to support OpenIDs from XYZ because their OpenID
> identifiers are too long.
> So the way I see it I have 3 options:
> 1) Bury my head in the sand and don't worry too much about uniqueness. Use
> an index on the URLs and use .Customize(x =>
> x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to minimize
> any problems. Take solace in the assumption that only one person should
> ever be able to use a given OpenID URL, so it would be pretty difficult to
> get into a concurrency problem where they're trying to create the same
> login in two places at once.
> 2) Use a unique constraint document, and check the length of the ID,
> truncating it to 127 characters if it is too long. This runs the risk of a
> collision - especially on those Google and Yahoo identifiers, I have no
> idea how the pseudorandom characters are generated.
> 3) A hash would be shorter than the full URL, but obviously it is risky to
> rely on that alone due to the possibility of hash collisions. Instead, I
> could use the full URL if it fits within 127 characters, and if it doesn't,
> use "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this
> provide enough guarantee of uniqueness?
> I'm not sure which path to take. It seems like I have to look into a
> crystal ball to determine which of 3 very unlikely scenarios is the most
> unlikely and I don't have the expertise to decide.
> Has anyone else implemented OpenID with Raven and can add something to my
> deliberations?
RavenOverflow has a quick implimentation of using Facebook for authentication, if you're trying to authenticate to a joe public normal off the shelf website, with RavenDb being used as the database.
I added OpenID/OAuth to my RavenDB app this week and went for a very similar document structure.
I actually went with option 1 - as an OpenID claim identifier should be unique. Since we always look for a user with a matching login we know whether we need to register them or not. For OAuth we check both the provider and the provider user id.
Itamar, don't worry, only using OpenID as data storage for application user data, not for DB login.
Matt, I strongly considered SHA256, especially after reading that it would be far more likely for me to be struck by lightning several times than to ever have a hash collision. But the thing that kept me from going that route was the loss of any meaning or human readability in those keys.
Raven 1.2 will fix the problem with a document id length of 1024, so when I found a post where Ayende mentioned that (although he didn't want to commit to a date yet) 1.2 will likely be out in the 2 month ballpark. I know this app is going to take a lot longer than that so I'm comfortable using a unique key document containing the entire OpenID identifier knowing that I'll be running RavenDB 1.2 before the 127 limit tries to bite me.
We're planning on doing Facebook login too, as well as "authenticated email" login (via trusted providers like Google, Yahoo, Facebook, StackID) so our login document key will have a format of "login/{Type}/{Identifier}" and contain properties for the Type, Identifier, and UserId.
Then I should be able to do the "manage my logins" view by indexing the UserId of the UserLogin model.
> I'm attempting to implement an OpenID (plus Facebook and perhaps > Twitter or other OAuth providers) login structure with RavenDB, very > much like the Stack Exchange network.
> For this I'm using a user document that boils down to:
> But, I started to grow concerned about enforcing uniqueness on the logins.
> I read this article about enforcing uniqueness using another document, > which is also the basic framework behind how the unique constraint > bundle works: http://old.ravendb.net/faq/unique-constraints
> However, if Raven document ids are limited to 127 characters until > 1.2, then that leaves me with a potential problem - with a key format > of "UniqueLogins/{OpenIDUrl}" any sufficiently long OpenID URL > wouldn't fit within 127 characters.
> We do have two apps using OpenID and the largest identifier URLs > currently top out at 97 characters - that's a Yahoo one. Google and > Yahoo are the worst offenders. 97 leaves me 30 characters of prefix > information in my document id, but I'm wary about counting on that. > Even just for Yahoo, the URL length varies widely in my existing data, > some as short as 35 characters. I would hate to bank on an assumed > maximum URL length and then find my application unable to support > OpenIDs from XYZ because their OpenID identifiers are too long.
> So the way I see it I have 3 options:
> 1) Bury my head in the sand and don't worry too much about uniqueness. > Use an index on the URLs and use .Customize(x => > x.WaitForNonStaleResultsAsOfLastWrite()) when I query it to try to > minimize any problems. Take solace in the assumption that only one > person should ever be able to use a given OpenID URL, so it would be > pretty difficult to get into a concurrency problem where they're > trying to create the same login in two places at once.
> 2) Use a unique constraint document, and check the length of the ID, > truncating it to 127 characters if it is too long. This runs the risk > of a collision - especially on those Google and Yahoo identifiers, I > have no idea how the pseudorandom characters are generated.
> 3) A hash would be shorter than the full URL, but obviously it is > risky to rely on that alone due to the possibility of hash collisions. > Instead, I could use the full URL if it fits within 127 characters, > and if it doesn't, use > "UniqueLogins/{HashOfLongUrl}/{AsMuchOfTheUrlAsWillFit}" - does this > provide enough guarantee of uniqueness?
> I'm not sure which path to take. It seems like I have to look into a > crystal ball to determine which of 3 very unlikely scenarios is the > most unlikely and I don't have the expertise to decide.
> Has anyone else implemented OpenID with Raven and can add something to > my deliberations?