Getting consensus on the short link situation...

106 views
Skip to first unread message

Sam Johnston

unread,
Apr 13, 2009, 5:40:38 PM4/13/09
to Short Link Group
Evening all,

As you're all no doubt aware, we have a bit of a mess on our hands both with URL shorteners and the proposed solutions. I'll try to summarise where we've come from quickly and where I think we need to get to:
  • URL shorteners (now running into the triple digits) create a bunch of problems relating to linkrot, analytics, security, opaque links, etc.
  • Google independently introduce rel="canonical" which allow webmaster to specify the primary/canonical URL for a given page (remembering any page typically has one main URL and any number of alternatives - e.g. from categories, search results, etc.)
  • The reverse (rev="canonical") is proposed as a way for a page to indicate short URL(s), but there are a number of problems with this:
    • It only works for the canonical URL (if you use it on other pages you're saying "I am the canonical URL and these other things point at me)
    • It is easily confused with rel="canonical" (which is saying "I am the canonical URL"), the result being that the short URL is used for SEO and sites can be permanently knocked off their perches
    • It's effectively offering a list of all URLs linking to a page, but providing only one. That in itself may not be a problem, but it's not clean.
    • It says nothing about the URL being "short" - that is, there's nothing stopping someone listing one or more long[er] URLs here.
    • It's not clear what one might do if the link appears multiple times - a short URL should really be unique.
  • A rel="alternate shorter" option was proposed, but per RFC 4287, "alternate" signifies an "alternate version of the resource". Normally this would be used to advertise e.g. a PDF or text version of the same document. Furthermore, HTTP 4 and 5 both specify that the link types are a space-separated list, so this technically means "alternate" and "shorter". Thus this suggestion is another non-starter.
  • I added rel="short" to the mix and this was getting traction until someone observed that this too could be referring to the resource rather than the link to it. That is, are we talking about a short version of a video or an abstract of a document, or are we talking about a shorter link to it. I figured this passed the obviousness test but apparently not so while it's still an option, it's probably not the best. Typically link relations are nouns too, not adjectives, but that's getting right down to semantics.
  • Meanwhile short_url was suggested but there was confusion with underscore vs dash vs space and of course the usual uri/url issues. The underscore was dropped but there's still the unnecessary uri/url confusion - you can be sure people will get it wrong and clients will end up looking for both, which is messy.
  • Wanting to improve on the rel="short" option I (and a number of other people) suggested rel="shortcut". This appeared to fit nicely with rel="shortcut icon" (which Microsoft et al have been pushing to advertise favicons), right up until someone noticed that that it's actually a space separated list... so compliant HTML parsers will see "shortcut" and "icon" separately. Unfortuantely I was basing my assumptions on the Atom APIs (which don't provide for a space-separated list) but I'm happy to concede that this wasn't such a good idea and could well cause some amount of breakage down the road.
So that brings us to where we are now. The only alternative that I see being free of the issues mentioned above is rel="shortlink". I haven't thus far been able to come up with a single reason why this approach isn't a good idea so I've created this group (http://groups.google.com/group/shortlink) and a Google Code project (http://code.google.com/p/shortlink/ - which I've added a few of you to already). I've already modified the shorter links wordpress plugin to churn out rel="shortlink" links and HTTP headers and it appears to be working nicely on my test installation. I'd like to whip up something similar for Drupal next.

So, existing installations aside (if they were so easily set up they'll be just as easily updated), does anyone have any better suggestions?

Sam

l.m.orchard

unread,
Apr 13, 2009, 8:47:13 PM4/13/09
to Short Link - URL shortening that really doesn't hurt the Internet
On Apr 13, 5:40 pm, Sam Johnston <s...@samj.net> wrote:

>    - The reverse (rev="canonical") is proposed as a way for a page to
>    indicate short URL(s), but there are a number of problems with this:

I still like rev="canonical", and have yet to be convinced otherwise,
so I'll stick up for it again and give it a rest here:

>       - It only works for *the* canonical URL (if you use it on other pages
>       you're saying "I am the canonical URL and these other things point at me)

This sounds like a feature, not a bug. Can you point out where this
is undesirable?

If the page using rev="canonical" or rel="short{*}" isn't the real
deal, what is and how can I find it?

>       - It is easily confused with rel="canonical" (which is saying "I
> *am*the canonical URL"), the result being that the short URL is used
> for SEO and sites can be permanently knocked off their perches

Honestly, I've never heard of this before. Is there a story of this
happening to someone?

On the other hand, just make sure to use rel="canonical" and
rev="canonical" for their respective proper purposes. In fact, use
them both, because they complement each other.

>       - It's effectively offering a list of all URLs linking to a page, but
>       providing only one. That in itself may not be a problem, but it's not clean.

Being able to offer multiple alternative URLs for a page is a feature,
not a bug.

I could offer several rev="canonical" URLs - each varying in length,
3rd party service, or by features not yet known to be useful.

The consumer (eg. Twitter) of my rev="canonical" URLs could pick
choose of my options by whatever criteria they like. Their criteria
could be length, or it could be to avoid certain 3rd party services
blacklisted by domain.

Granted, most users will stick with one rev="canonical" URL, but this
choice would remain a possibility.

>       - It says nothing about the URL being "short" - that is, there's
>       nothing stopping someone listing one or more long[er] URLs here.

Again, as above, this is a feature and not a bug. rev="canonical"
offers no guarantee of shorter URLs - but then, some people don't want
their URLs shortened at all.

The whole point of this concept, in my mind, is to give publishers
control over their own URL spaces. By convention, most people will
offer a shorter URL, but some people want to opt out of this URL
munging scheme altogether.

>       - It's not clear what one might do if the link appears multiple times

Pick the URL you like best, for whatever reasons suit you.
rev="canonical" asserts they all go to the same place, so it'll be an
evaluation of the features of the URLs themselves that guides your
choice.

>       - a short URL should really be unique.

Why? It's not canonical, and it shouldn't be indexed or even relied
upon to be stable IMO. Find the canonical and throw away the
intermediate link, who cares what it was.

> So that brings us to where we are now. The only alternative that I see being
> free of the issues mentioned above is rel="shortlink". I haven't thus far
> been able to come up with a single reason why this approach isn't a good
> idea

rel="shortlink" technically suggests that the URL to a document
describing the "shortlink" resource for the current page, so it's
still a little shoehorned if we're splitting hairs. I'm also assuming
that rel="shortlink" is meant to be one unique URL per page, too,
right? As I said above, I actually prefer multiple links possible per
page to provide for choice on both sides.

For what it's worth, I'll be swayed when the rel="shortlink" story (is
that the final answer now?) obviates all the above features that I
actually like about rev="canonical". Either that, or when no one else
is using rev="canonical", whichever comes first.

And beyond that, have fun deploying code and trying out the ideas.

Sam Johnston

unread,
Apr 13, 2009, 9:33:15 PM4/13/09
to shor...@googlegroups.com
On Tue, Apr 14, 2009 at 2:47 AM, l.m.orchard <l.m.o...@gmail.com> wrote:

>    - The reverse (rev="canonical") is proposed as a way for a page to
>    indicate short URL(s), but there are a number of problems with this:

I still like rev="canonical", and have yet to be convinced otherwise,
so I'll stick up for it again and give it a rest here:

rev="canonical" is not only fatally flawed but flat out dangerous. There's already been at least one case of confusion between rel and rev (save that it takes a long time for even smart people to grok, and that it's been deprecated) and if you get it wrong you're virtually guaranteed to land yourself in a world of (potentially permanent) hurt with the search engines. Promoting rev="canonical" knowing all that is borderline negligent.
 
>       - It only works for *the* canonical URL (if you use it on other pages
>       you're saying "I am the canonical URL and these other things point at me)

This sounds like a feature, not a bug.  Can you point out where this
is undesirable?

Pages are routinely referenced by any number of URLs - for example search results, category listings, etc. as explained in detail here (the first, bold one being the canonical URL):

http://www.example.com/product.php?item=swedish-fish
http://www.example.com/product.php?item=swedish-fish&category=gummy-candy
http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678
 
Given that rev=canonical implies that the page that contains it is the canonical URL it can only be used on the canonical URL itself. That is, if the user rocks up at any one of an infinite number of other pages then you must not include the rev=canonical link.

If the page using rev="canonical" or rel="short{*}" isn't the real
deal, what is and how can I find it?

Hopefully the examples above illustrate well what I'm talking about.
 
>       - It is easily confused with rel="canonical" (which is saying "I
> *am*the canonical URL"), the result being that the short URL is used
> for SEO and sites can be permanently knocked off their perches

Honestly, I've never heard of this before.  Is there a story of this
happening to someone?

Well the idea is only days old so it's quite scary that some large sites are already tinkering with it - sounds like a very efficient way to lose your job to me (many sites live and die by their SEO). Anyway I already spotted one early adopter referring to rev=canonical as rel=canonical earlier today. It's an easy mistake to make, one that would take a while to manifest itself (e.g. until the next google dance), one that would be very difficult to diagnose and one that could cause untold, potentially permanent damage. Again, promoting this solution knowing that it could easily fail catastrophically brings up the negligence question.
 
On the other hand, just make sure to use rel="canonical" and
rev="canonical" for their respective proper purposes.  In fact, use
them both, because they complement each other.

rel="canonical" is a powerful and useful tool that should be extensive tool. rev="anything" is deprecated and that's not going to change, least of all for this non-requirement.
 
>       - It's effectively offering a list of all URLs linking to a page, but
>       providing only one. That in itself may not be a problem, but it's not clean.

Being able to offer multiple alternative URLs for a page is a feature,
not a bug.

Really? Being able to *resolve* multiple alternative URLs (e.g. decimal identifier, compressed/base32 identifier and human-friendly slug) sounds useful, but giving the user a choice sounds unuseful and is in any case supported by having multiple instances of "shortlink" urls.
 
I could offer several rev="canonical" URLs - each varying in length,
3rd party service, or by features not yet known to be useful.

Yes, and you could do exactly the same with rel="shortlink".
 
The consumer (eg. Twitter) of my rev="canonical" URLs could pick
choose of my options by whatever criteria they like.  Their criteria
could be length, or it could be to avoid certain 3rd party services
blacklisted by domain.

Isn't the whole point of this exercise to avoid 3rd party services? So that's a non-requirement, and again one that is supported anyway.
 
Granted, most users will stick with one rev="canonical" URL, but this
choice would remain a possibility.

Which is not unique to rev="canonical", as explained above.
 
>       - It says nothing about the URL being "short" - that is, there's
>       nothing stopping someone listing one or more long[er] URLs here.

Again, as above, this is a feature and not a bug.  rev="canonical"
offers no guarantee of shorter URLs - but then, some people don't want
their URLs shortened at all.

There's no point offering an alternative to the canonical URL if it doesn't have some feature that the canonical one does not (e.g. length, readability, etc.). We're talking about shortening URLs here so we do actually want to know the result is, in fact, relatively short.
 
The whole point of this concept, in my mind, is to give publishers
control over their own URL spaces.  By convention, most people will
offer a shorter URL, but some people want to opt out of this URL
munging scheme altogether.

In which case they simply do nothing...
 
>       - It's not clear what one might do if the link appears multiple times

Pick the URL you like best, for whatever reasons suit you.
rev="canonical" asserts they all go to the same place, so it'll be an
evaluation of the features of the URLs themselves that guides your
choice.

Still not seeing why the user would want to choose between http://example.com/123 and http://example.com/abc. This additional complexity would piss off most users - the services we're competing with just work.
 
>       - a short URL should really be unique.

Why?  It's not canonical, and it shouldn't be indexed or even relied
upon to be stable IMO.  Find the canonical and throw away the
intermediate link, who cares what it was.

What? Short URLs need not be stable? You're kidding, right? Short URLs have all sorts of applications outside of twitter, like printing into advertisements or reading over the radio. Stability is going to be an absolute must in most instances.
 
> So that brings us to where we are now. The only alternative that I see being
> free of the issues mentioned above is rel="shortlink". I haven't thus far
> been able to come up with a single reason why this approach isn't a good
> idea

rel="shortlink" technically suggests that the URL to a document
describing the "shortlink" resource for the current page, so it's
still a little shoehorned if we're splitting hairs.  I'm also assuming
that rel="shortlink" is meant to be one unique URL per page, too,
right?  As I said above, I actually prefer multiple links possible per
page to provide for choice on both sides.

Sorry, I was unable to parse 'technically suggests that the URL to a document describing the "shortlink" resource for the current page'. It sounds a lot like you're being pedantic - you can't tell me that rel="shortlink" makes any less sense than rev="canonical" when it takes even someone like me more than 10 seconds to work out what is going on.
 
For what it's worth, I'll be swayed when the rel="shortlink" story (is
that the final answer now?) obviates all the above features that I
actually like about rev="canonical".  Either that, or when no one else
is using rev="canonical", whichever comes first.

So far as I can tell it already has... still waiting for you to identify even one useful thing that rel="shortlink" can't do that rev="canonical" does.
 
And beyond that, have fun deploying code and trying out the ideas.

If your implication is that rev="canonical" has an installed base then I can assure you that this install base can be switched over to whatever we decide on in a heartbeat so that's another non-issue. I've already written a wordpress plugin and a Drupal version isn't far behind.

Sam

l.m.orchard

unread,
Apr 13, 2009, 11:15:43 PM4/13/09
to Short Link - URL shortening that really doesn't hurt the Internet
On Apr 13, 9:33 pm, Sam Johnston <s...@samj.net> wrote:
> On Tue, Apr 14, 2009 at 2:47 AM, l.m.orchard <l.m.orch...@gmail.com> wrote:
>
> rev="canonical" is not only fatally flawed but flat out dangerous.
...
> Promoting rev="canonical" knowing all that is borderline negligent.

And that's the other thing that bugs me about the alternative proposal
to rev="canonical" - it sounds like FUD.

That's no fun, so I'm inclined to resist it.

> rev="anything" is deprecated and that's not going to change, least of all
> for this non-requirement.

The deprecation thing is circular: It's deprecated in HTML 5 because
no one was using it - but if someone starts using it from HTML 4, why
should it remain deprecated? It's still in HTML 4, and HTML 5 is
still a ways off.

> Isn't the whole point of this exercise to *avoid* 3rd party services? So
> that's a non-requirement, and again one that is supported anyway.

The whole point (for me) is publisher choice. Some people might pick
a 3rd party service run by their friends, rather than whomever Twitter
picks. Or they might self-host, again by their own choice. You say
it's supported, and I say it's a good idea without any need to
begrudge it.

> There's no point offering an alternative to the canonical URL if it doesn't
> have some feature that the canonical one does not (e.g. length, readability,
> etc.). We're talking about shortening URLs here so we do actually want to
> know the result is, in fact, relatively short.

You're talking about shortening URLs, I'm talking about publisher
choice. This includes expressing an explicit preference not to have
one's URLs shortened. It's a subtle distinction, reflected in both
the rev="canonical" and rel="shortlink" terminologies.

Leaving out rel="shortlink" or rev="canonical" implies that there's no
short URL for the page, but including one of these links that restates
the canonical makes an explicit assertion that shortening is
unwelcome. It's another subtle feature that I like, but will admit is
quixotic.

> Still not seeing why the user would want to choose between http://example.com/123
> and http://example.com/abc.

Consider a canonical URL of:
* http://brand.com/index.php?mode=view&content_id=8675309

Consider choices such as:
* http://bit.ly/f12fw
* http://brand.com/billboard-friendly-phrase

Each of these offer different features. Twitter can pick the
shortest. Another consumer could offer a user-visible selector that
includes the second. Again, you're talking about URL shortening, and
I'm talking about choices (including shortening).

> What? Short URLs need not be stable? You're kidding, right? Short URLs have
> all sorts of applications outside of twitter, like printing into
> advertisements or reading over the radio. Stability is going to be an
> absolute must in most instances.

No, they don't *need* to be stable, but they *can* be relatively
speaking. But you should look to the canonical URL if you *really*
want stability - those other URLs are just disposable pointers whose
hosts are not necessarily as reliable as the canonical.

> So far as I can tell it already has... still waiting for you to identify
> even one useful thing that rel="shortlink" can't do that rev="canonical"
> does.

The name "shortlink" implies a limit of scope to URL shortening,
whereas I'm hoping for a slightly expanded scope. It seems that some
ideas I consider useful are bugs in the rel="shortlink" sphere. So,
I'm pinning the ideas to rev="canonical" if only for the sake of
argument.

I also happen to think the rel="canonical" / rev="canonical"
relationship itself is complementary and elegant, but I can let go of
that as too-clever.

The circular deprecation argument, "rev is hard", and the fear of
misspelled attributes don't do rel="shortlink" any real favors against
rev="canonical" - so I'd like to see the story distilled to the
positives to solidify consensus.

> > And beyond that, have fun deploying code and trying out the ideas.
>
> If your implication is that rev="canonical" has an installed base then I can
> assure you that this install base can be switched over to whatever we decide
> on in a heartbeat so that's another non-issue. I've already written a
> wordpress plugin and a Drupal version isn't far behind.

What I said was: Have fun. Deploy code. Try out ideas.

What I implied was: Lighten up.

No one's paying you if you "win". You're asking for consensus, but
you're speaking the language of Fear/Uncertainty/Doubt.

Honestly, there's a certain sliver of my resistance to rel="shortlink"
based on not wanting to be scared into using it.

And with that, I'm wandering off to lurk. Have fun. Seriously. No
sarcasm quotes.

Sam Johnston

unread,
Apr 14, 2009, 7:03:14 AM4/14/09
to shor...@googlegroups.com
On Tue, Apr 14, 2009 at 5:15 AM, l.m.orchard <l.m.o...@gmail.com> wrote:

On Apr 13, 9:33 pm, Sam Johnston <s...@samj.net> wrote:
> On Tue, Apr 14, 2009 at 2:47 AM, l.m.orchard <l.m.orch...@gmail.com> wrote:
>
> rev="canonical" is not only fatally flawed but flat out dangerous.
...
> Promoting rev="canonical" knowing all that is borderline negligent.

And that's the other thing that bugs me about the alternative proposal
to rev="canonical" - it sounds like FUD.

That's no fun, so I'm inclined to resist it.

Do you dispute that using "rel" in place of "rev" could potentially be extremely harmful? Right, so it's not FUD and the (significant) risk is trivially mitigated by moving to a sensible alternative.
 
> rev="anything" is deprecated and that's not going to change, least of all
> for this non-requirement.

The deprecation thing is circular: It's deprecated in HTML 5 because
no one was using it - but if someone starts using it from HTML 4, why
should it remain deprecated?  It's still in HTML 4, and HTML 5 is
still a ways off.

No, it was deprecated with good reason (nearly nobody was using it, and almost all those that were were  using it incorrectly) and as is the case here, "for every rev="" value you can find or  define an equivalent rel="" value".
 
> Isn't the whole point of this exercise to *avoid* 3rd party services? So
> that's a non-requirement, and again one that is supported anyway.

The whole point (for me) is publisher choice.  Some people might pick
a 3rd party service run by their friends, rather than whomever Twitter
picks.  Or they might self-host, again by their own choice.  You say
it's supported, and I say it's a good idea without any need to
begrudge it.

Ok so we don't need to talk any more about this - both rev and rel alternatives support it.
 
> There's no point offering an alternative to the canonical URL if it doesn't
> have some feature that the canonical one does not (e.g. length, readability,
> etc.). We're talking about shortening URLs here so we do actually want to
> know the result is, in fact, relatively short.

You're talking about shortening URLs, I'm talking about publisher
choice.  This includes expressing an explicit preference not to have
one's URLs shortened.  It's a subtle distinction, reflected in both
the rev="canonical" and rel="shortlink" terminologies.

I'm not sure there is a use case for specifying urls that are longer than the canonical URL, nor for "forbidding" short URLs (people will just use third parties, as they do today). In any case you can do both by specifying a blank href or a long href, so this is another non-issue.
 
Leaving out rel="shortlink" or rev="canonical" implies that there's no
short URL for the page, but including one of these links that restates
the canonical makes an explicit assertion that shortening is
unwelcome.  It's another subtle feature that I like, but will admit is
quixotic.

No, it just means that the specification isn't implemented and you'll have to solve the problem another way (e.g. by making do with a long url or using a third party server).
 
> Still not seeing why the user would want to choose between http://example.com/123
> and http://example.com/abc.

Consider a canonical URL of:
* http://brand.com/index.php?mode=view&content_id=8675309

Consider choices such as:
* http://bit.ly/f12fw
* http://brand.com/billboard-friendly-phrase

Each of these offer different features.  Twitter can pick the
shortest.  Another consumer could offer a user-visible selector that
includes the second.  Again, you're talking about URL shortening, and
I'm talking about choices (including shortening).

Right, I admit that a human-friendly slug is useful as well as a short-as-possible alternative... though I had assumed this would be a publisher choice (per above) rather than an end user choice.
 
> What? Short URLs need not be stable? You're kidding, right? Short URLs have
> all sorts of applications outside of twitter, like printing into
> advertisements or reading over the radio. Stability is going to be an
> absolute must in most instances.

No, they don't *need* to be stable, but they *can* be relatively
speaking.  But you should look to the canonical URL if you *really*
want stability - those other URLs are just disposable pointers whose
hosts are not necessarily as reliable as the canonical.

I think you've got this the wrong way around. Short URLs (especially those based on unique identifiers) are very unlikely to change, whereas canonical URLs can and do change all the time. I don't see that there is any need to set a canonical URL in stone...
 
> So far as I can tell it already has... still waiting for you to identify
> even one useful thing that rel="shortlink" can't do that rev="canonical"
> does.

The name "shortlink" implies a limit of scope to URL shortening,
whereas I'm hoping for a slightly expanded scope.  It seems that some
ideas I consider useful are bugs in the rel="shortlink" sphere.  So,
I'm pinning the ideas to rev="canonical" if only for the sake of
argument.

Still can't imagine a use case for URLs that are *longer* than canonical, but it's supported anyway. Remember shortcuts (e.g. in Windows) are just a way to get to a destination - the handle can be as long as you like (I've regularly seen long "shortcuts" used to add meaning to short filenames).
 
I also happen to think the rel="canonical" / rev="canonical"
relationship itself is complementary and elegant, but I can let go of
that as too-clever.

+1 elegance, but like you say, too clever and too confusing.
 
The circular deprecation argument, "rev is hard", and the fear of
misspelled attributes don't do rel="shortlink" any real favors against
rev="canonical" - so I'd like to see the story distilled to the
positives to solidify consensus.

Yes they do:
  1. rev is depreciated and you may as well assume that you can't change that (try, and you'll be asked why there's no sensible rel alternative, which there is)
  2. rel is well understood. rev takes some amount of explaining, even for advanced users. many will never grok it and will get it wrong as a result.
  3. short_url vs shorturl vs shorturi vs short_uri vs "short url" vs "short uri" is hellishly confusing, and besides - shorturl is covered by a trademark so there's potential legal hot water there too (not to mention the unfair advantage shorturl.com gets from our popularising the term and the confusion that their existence creates given we're essentially trying to get rid of them!)
> > And beyond that, have fun deploying code and trying out the ideas.
>
> If your implication is that rev="canonical" has an installed base then I can
> assure you that this install base can be switched over to whatever we decide
> on in a heartbeat so that's another non-issue. I've already written a
> wordpress plugin and a Drupal version isn't far behind.

What I said was: Have fun. Deploy code. Try out ideas.

What I implied was: Lighten up.

No one's paying you if you "win".  You're asking for consensus, but
you're speaking the language of Fear/Uncertainty/Doubt.

Honestly, there's a certain sliver of my resistance to rel="shortlink"
based on not wanting to be scared into using it.

And with that, I'm wandering off to lurk.  Have fun.  Seriously.  No
sarcasm quotes.

Look, we're all trying to achieve the same goal here and it doesn't matter who came up with the idea (I'm already maintaining a list of acknowledgements) so long as whatever takes hold is the best possible alternative. There are serious problems with both rev="canonical" and rel="shorturl" and none have been identified for rel="shortlink" so that's what I'm pushing. It's unfortunate that others are still pushing broken alternatives but that is the purpose of this group - I was happy to admit that I was wrong with "shortcut" (and less so with "short") so I've changed my view accordingly... here's hoping that others do the same.

Sam

Reply all
Reply to author
Forward
0 new messages