WEP 110: Site Auto-registration

4 views
Skip to first unread message

Anant Narayanan

unread,
Oct 28, 2009, 2:07:53 PM10/28/09
to Weave Dev List
Hi All,

I was hacking on site auto-registration sometime ago and have written
up a WEP describing a basic JSON structure I was using to determine
the mechanics of registration, login and password changes:

https://wiki.mozilla.org/Labs/Weave/WEP/110

The Captcha issue has not been tackled by the WEP, yet, it is
important for us to address it because virtually every site asks for
Captcha feedback on registration.

I welcome your feedback, comments, and criticism!

Regards,
Anant

Dan Mills

unread,
Oct 28, 2009, 2:51:16 PM10/28/09
to mozilla-lab...@googlegroups.com
Hey,

Good work. We should sync this up with the Account Manager doc:

https://wiki.mozilla.org/Labs/Weave/Identity/Account_Manager

So that WEP110 (or something that comes after it) can become a
technical document for the "Formal protocol/format definitions"
requirement.

Dan

Mike Hanson

unread,
Oct 28, 2009, 3:06:57 PM10/28/09
to mozilla-lab...@googlegroups.com
Hi, Anant -

I'll toss in a couple ideas that I've been percolating on as well.  I've added these to the WEP page's Discussion section as well.

1. When we get around to writing up the Account Management Control Document specification (or whatever we're going to call it), we should think about whether to support an XML binding as well as the JSON one you propose - we could easily do both.  The structure of the document is certainly quite regular, so far.

2. I would add logout, account-status, account-change and account-revoke to the list of methods you specify there.  (definitions TBD!)  And I would sugguest "account-management" for the rel attribute.

3. Formalizing something that somebody (thunder? rags?) said to me the other day: we should reference profile data by the combination of a schema identifier (URL) and a list of schema elements (strings).  This means that we can be a bit agnostic about schema battles, and future-proof ourselves... while still recommending the schema that we think is best.

So, for example, I would suggest that the register schema you describe in the example be written as:

   "register":
    {
     path: "/register",
     tos: "/register/tos.html",
     method: "POST",
     params:
      {
       "userName": "name_field",
       "name/givenName": "first_name_field",
       "name/familyName": "last_name_field",
       "emailHome": "email_field",
       "password": ["pw_field", "pw_field2"]
      }
}
There's a couple wrinkles with that... notice that in order to really indicate the PoCo "givenName" field I had to use a hierarchical syntax, which means I have a '/' in an attribute name, which isn't legal JSON.  So I'd have to go to the uglier syntax of 
	params: { ["username", "name_field"], ["name/givenName", "first_name_field"] } ... etc.
... which is a bummer... or adopt a delimiter other than '/', like perhaps underscore.   Which would be better.
But in general I think we shouldn't hard-code PoCo 1.0 into our implementation, especially since there are so many competing standards out there already.
We could, also, support a very terse style for the cases where field-specific name overrides are not required... then you could just do:
	params: { "username", "name/givenName", "name/familyName" }
4. We should add the notion of "required" and "optional" to the list of register params.
5. We may want to add ToS tagging to each param -- especially email -- otherwise the general ToS will have to encompass it, which could be less useful to users.
If the ToS _must_ be read before account registration can continue, we could include metadata to that effect (and do the usual "next-button-at-the-bottom" thing).
6. We could include a "challenge" parameter in "register", which could be follow an OpenID-challenge-like message exchange pattern for CAPTCHAs, e.g.
challenge: 
src: "/captcha_start.html"
final: "/captcha_success.html",
token-param: "captcha_ok"
}

e.g. send the user to captcha_start, and if we get redirected to captcha_success, read the captcha_ok parameter and pass it along with the register request.  Or something.  That's pretty half-baked.  But we have the usual distributed trust problems, of nonces, man-in-the-middles, replays, etc.
The other alternative would be to centralize captcha testing (e.g. the register just says "captcha this"), but that means we have the distributed trust problem in reverse -- how does the site know that browser X does a good job captchaing?  And then we get into reputation systems, etc. etc.
7. Using a link element has some nice properties: it's easy for content administrators, it's self-evident from a view source, and it clearly defines administrative domains on a per-page basis.  But it has some bad properties too: it requires that we parse some of the page before we know what to do, and it precludes the use of the HEAD method to optimize the exchange.
Dan had suggested that we actually specify a HTTP header as the account-management link URI, and then use a META HTTP-EQUIV to include it in the body.  That's a more browser-ish way to do it.  A caveat is that the user community is quite open to crazy new LINK tags, and not so much to new HTTP headers.
One other thought I had is that each of these link URIs effectively defines an administrative domain -- they act quite a lot like the Realm attribute of HTTP Basic Auth in that regard.  If we included a "name" attribute on the AMCD (or whatever we call that document!) we could expose that to the user in a not-hostile way.

-M

Dan Mills

unread,
Oct 28, 2009, 5:36:47 PM10/28/09
to mozilla-lab...@googlegroups.com
Hey Mike,

Thanks a lot for the detailed comments! Mine are inline-

On Wed, Oct 28, 2009 at 3:06 PM, Mike Hanson <mha...@mozilla.com> wrote:
> Hi, Anant -
> I'll toss in a couple ideas that I've been percolating on as well.  I've
> added these to the WEP page's Discussion section as well.
> 1. When we get around to writing up the Account Management Control Document
> specification (or whatever we're going to call it), we should think about
> whether to support an XML binding as well as the JSON one you propose - we
> could easily do both.  The structure of the document is certainly quite
> regular, so far.

Maybe. How would we know which format it's in?

I don't see a big benefit in supporting XML, TBH, but maybe I'm
missing something.

> 2. I would add logout, account-status, account-change and account-revoke to
> the list of methods you specify there.  (definitions TBD!)  And I would
> sugguest "account-management" for the rel attribute.

+1

> 3. Formalizing something that somebody (thunder? rags?) said to me the other
> day: we should reference profile data by the combination of a schema
> identifier (URL) and a list of schema elements (strings).  This means that
> we can be a bit agnostic about schema battles, and future-proof ourselves...
> while still recommending the schema that we think is best.

Yes, we should try hard to separate the schema of our document (the
AMCD) from the schema of the data the site wants (the Account
Registration Document?). So I like the schema attribute in the
register section of the AMCD, but I think we need to think a bit
harder about the params piece. For example, PoCo allows for multiple
email accounts, see spec here:

http://portablecontacts.net/draft-spec.html#anchor18

It seems likely some website will want one or more of those, we should
figure out how to describe them.

> 4. We should add the notion of "required" and "optional" to the list of
> register params.

Maybe as an additional hash, so everything in 'params' is required? we
could then rename params to requiredFields or something.

> 6. We could include a "challenge" parameter in "register", which could be
> follow an OpenID-challenge-like message exchange pattern for CAPTCHAs, e.g.

So...yeah. We should maybe chat with some captcha providers (e.g.
Recaptcha, now Google) and see if we can come up with some open api
spec for captchas. Then we can support any captcha server that
supports that spec.

We could draft something basic that we think will work and propose it.
The simpler the better.

The idea of "ask the browser to handle it and report success" just
won't work, since that means it would be trivial to construct a
browser that just always reports success (spammers:1, us:0).

> 7. Using a link element has some nice properties: it's easy for content
> administrators, it's self-evident from a view source, and it clearly defines
> administrative domains on a per-page basis.  But it has some bad properties
> too: it requires that we parse some of the page before we know what to do,
> and it precludes the use of the HEAD method to optimize the exchange.
>
> Dan had suggested that we actually specify a HTTP header as the
> account-management link URI, and then use a META HTTP-EQUIV to include it in
> the body.  That's a more browser-ish way to do it.  A caveat is that the
> user community is quite open to crazy new LINK tags, and not so much to new
> HTTP headers.

I think the HTTP header is a must, though, and anyone seriously
implementing this will need to use it. e.g., I should be able to set
things up so that visiting http://example.com/image.png will trigger a
login for the correct realm.

> One other thought I had is that each of these link URIs effectively defines
> an administrative domain -- they act quite a lot like the Realm attribute of
> HTTP Basic Auth in that regard.  If we included a "name" attribute on the
> AMCD (or whatever we call that document!) we could expose that to the user
> in a not-hostile way.

Yes, I was thinking about this too. We could specify that the AMCD
should map 1:1 with the realm, which solves the realm problem quite
nicely (each independent set of resources that needs a unique login
gets its own AMCD - no need for us to guess based on url structure or
whatnot).

Dan

Mike Hanson

unread,
Oct 28, 2009, 7:26:23 PM10/28/09
to mozilla-lab...@googlegroups.com
Comments inline again...


On Oct 28, 2009, at 2:36 PM, Dan Mills wrote:


Hey Mike,

Thanks a lot for the detailed comments!  Mine are inline-

On Wed, Oct 28, 2009 at 3:06 PM, Mike Hanson <mha...@mozilla.com> wrote:
Hi, Anant -
I'll toss in a couple ideas that I've been percolating on as well.  I've
added these to the WEP page's Discussion section as well.
1. When we get around to writing up the Account Management Control Document
specification (or whatever we're going to call it), we should think about
whether to support an XML binding as well as the JSON one you propose - we
could easily do both.  The structure of the document is certainly quite
regular, so far.

Maybe.  How would we know which format it's in?

I don't see a big benefit in supporting XML, TBH, but maybe I'm
missing something.


Lots of ways to figure the format - the obvious way is to send an Accept: header and to inspect the Content-Type: of the response.

The reason I mention XML is that we are definitely getting close to the kind of discovery data that website developers are being asked to think about more and more these days, and all the formats that they deal with in this domain are XMLish.  XRD is the closest (and in fact I could imagine a profile for this protocol that puts it into XRD); the Google Site Map document format is another.

I know our tendency is to jump to JSON, because we're a JS-ish organization, but the people who would be producing the account-management manifest are more likely to be XML-ish.
'

6. We could include a "challenge" parameter in "register", which could be
follow an OpenID-challenge-like message exchange pattern for CAPTCHAs, e.g.

So...yeah.  We should maybe chat with some captcha providers (e.g.
Recaptcha, now Google) and see if we can come up with some open api
spec for captchas.  Then we can support any captcha server that
supports that spec.

We could draft something basic that we think will work and propose it.
The simpler the better.

The idea of "ask the browser to handle it and report success" just
won't work, since that means it would be trivial to construct a
browser that just always reports success (spammers:1, us:0).\

Trivial if there's no crypto involved, yes.  But I could imagine a hosted service integrated with a browser that used some private keying information on the server.  captcha.services.mozilla.com?  That said, I'd rather use one of the existing hosted solutions.


Dan Mills

unread,
Oct 29, 2009, 12:37:03 PM10/29/09
to mozilla-lab...@googlegroups.com
On Wed, Oct 28, 2009 at 7:26 PM, Mike Hanson <mha...@mozilla.com> wrote:

> Trivial if there's no crypto involved, yes.  But I could imagine a hosted
> service integrated with a browser that used some private keying information
> on the server.  captcha.services.mozilla.com?  That said, I'd rather use one
> of the existing hosted solutions.

Interesting. So it would look like:

1) Browser gets challenge from captcha service
2) User responds to challenge
3) Browser sends response to captcha service and gets back a signed response
4) Browser sends signed response to website
5) Website verifies response is signed by an authority they trust

As opposed to:

1) Browser gets captcha provider from AMCD, then gets a challenge from
that provider
2) User responds to challenge
3) Browser sends response to provider, gets back a token (no crypto)
4) Browser sends token to website
5) Website sends token to provider for verification

I'm somewhat ambivalent between the two, but I think I still prefer
the second scenario.

It allows the website to define what captcha service they wish to use,
bypassing the "reputation system" problem you mentioned earlier. And
we could still host a captcha service that websites could point to, if
we wanted (though I suspect that won't be necessary).

If the website and captcha service are the same, then the extra HTTP
roundtrip can be eliminated, so it has the potential to be a cheaper
solution overall.

Dan

Reply all
Reply to author
Forward
0 new messages