[twitter-api-announce] Early look at Annotations

Skip to first unread message

Marcel Molina

Apr 16, 2010, 1:54:32 PM4/16/10
to twitter-deve...@googlegroups.com, twitter-ap...@googlegroups.com
Hey everyone. One of the things we talked about at Chirp is the new Annotations feature we're working on. In short, it allows you to annotate a tweet with structured metadata. We're still working on Annotations, but I wanted to share with a wider audience beyond those I was able to talk to in person at Chirp about how we're thinking of doing Annotations.

* What is an annotation more exactly exactly?

First off let's be clearer about what an annotation is. An annotation is a namespace, key, value triple. A tweet can have one or more annotations. Namespaces can have one or more key/value pairs.

* How do I specify what annotations a tweet should have?

Annotations are specified for a tweet when the tweet is created. When submitting a POST to /statuses/update, you'll include an "annotations" parameter with your annotations. We're thinking we'll provide two mechanisms for specifying what a tweet's annotations are:

  1. JSON
  2. form encoded parameters

* How big can an annotation be and how many annotations can I attach to a tweet?

There is no limit on the size of any given namespace, key or value but the entire set of all annotations for a given tweet can not exceed some fixed byte size. That size isn't set in stone yet. We will be starting small (probably 512 bytes) and growing it gradually as we incrementally roll out the feature so we can gauge its scalability at various sizes. We'd like to (no promises) have it end up around 2K. How you use that 2K is up to you. You can attach one honking annotation, or a thousand+ tiny ones. You can attach one namespace with hundreds of key/value pairs, or hundreds of namespaces with just one key/value pair. We want to keep things as flexible and open ended as possible.

* What kind of data can go into an annotation?

We'd like to allow for any arbitrary data to be stored in an annotation. Arbitrary Unicode? Sure. MIDI? Go for it. Emoji? Yes please! There might be some tricky edge cases though. Skip the rest of this paragraph if you don't care about the details of edge cases... For one, since these annotations will be serialized to, among other formats, XML, and we'd like to keep the XML succinct, the namespace and key components of an annotation triple would likely be an XML tag with its value as, well, its value. If that's the case then the data of the key must be a valid XML tag. This greatly limits what it can contain (not even spaces for example). If allowing all three elements of the triple to contain any arbitrary data is more important than a succinct XML payload then we'll design a more verbose XML payload. Up to you all really. I've included examples of both options below. Make a case for another proposal if you have strong opinions.

* What constitutes a valid annotation?

Aside from the size and data type restrictions listed above, another requirement is that namespaces and keys be non-empty values. Values, on the other hand, may be empty. In this way the namespace/key pair can be treated like a flag of sorts. It should be noted: I'd encourage everyone to always think of a namespace as a namespace, to think of a key as a key and to think of a value as a value. Don't take the fact that a value can be empty to mean that you can skip out on the whole namespace think and morph the namespace into a key and the key into a value. While open endedness and flexibility is a quality of the Annotations feature that I'm most excited about for the developer community, this kind of approach seems prone to causing confusion by undermining namespaces.

* What namespaces can I write to? What namespaces can I read from?

Anyone can write to or read from any namespace. We aren't planning on enforcing any policy that restricts someone else from adding an annotation with "your" namespace or seeing annotations only if they are logged in with a certain account. In the absence of some really compelling reason to do that, we want to err on the side of making this feature as flexible and open ended as possible. Namespaces aren't intended as a way for people to claim their little slice of the tweet space. Rather they are intended to dramatically increase the possible significance of a given key/value pair. If you want a given key to mean one thing and someone else wants that same key to mean something else, and someone else still wants another meaning, consumers of your annotations are put in a tricky spot trying to figure out how to interpret a given annotation without the disambiguation of a namespace.

* How do we consume annotations?

For convenience, we plan on including annotations for a tweet directly embedded into that tweet's payload. The XML payload of a tweet I just inspected at random came out to about 2K in size. The "worst case" annotation would a little more than double that payload to probably about 5k. We're erring on the side of thinking that the moderate increase in payload size for tweets with annotations, even on slow connections, is both more convenient and faster than the latency and inconvenience incurred by adding another HTTP round trip. Though we'd like to provide an embedded and non embedded option, the maintenance cost and fragment cache space increase makes supporting both likely unrealistic so we're going with what we think satisfies the 80% case. Push back as appropriate.

* What will the payloads look like?

This isn't final. The payloads could end up wildly different after we noodle around in things like RDF and the semantic web's literature and all that kind of stuff. You can't see me but my hands are waving vigorously.

Given a hypothetical tweet, "Just got 'Although Of Course You End Up Becoming Yourself' in the mail. Hopeful. Heart broken."


    'isbn': '030759243X'

  XML option #1 which is succinct but restricts the possible values of namespaces and keys


  XML option #2 which is more verbose but allows for namespaces and keys to contain arbitrary data


If we went with XML option #2 it may or may not be a problem that it isn't "symmetrical" with the JSON representation. On the other hand, JSON and XML tend to be culturally at opposite sides of the Pithiness Spectrum.

* Can I add annotations to a tweet after the tweet has been created?

No. Like the text of a tweet, its annotations are also immutable. They can only be specified when the tweet they are being attached to is created. For talking purposes, though, if you want to add annotations to a tweet after the fact, you could retweet the original tweet and attach annotations to the retweet.

* Ok, great. What should I use annotations for though?

We don't know! That's the cool thing. Annotations are a blank slate that lend themselves to myriad divergent use cases. We want to provide open-ended utility for all the developers to innovate on top of. Some of us have  initial ideas of cool potential uses cases that I'm sure we'll start to share just to seed the conversation as we get closer to launch. Developers will experiment with annotations. Certain ideas and approaches will catch on. Certain annotations will become standards democratically because everyone agrees. Some might have diverging opinions. It's something that we hope will grow organically and be driven by sociological and cultural forces.

* Ok, great. How are we going to figure out what Joe Random's annotations actually mean?

That's something we need to figure out as a community. But here is an early idea: People could add some agreed upon "meta-annotation" that points to something which *describes* the annotation or annotations that person is using. Think something sort of like XML DTD, though not necessarily machine readable. This meta annotation could point to a URL that simply has an HTML document that gives a description with some examples of the various annotations you're experimenting with or standardizing on.

* Will it be in search? Streaming? Mobile? My toaster?

We hope so! When we launch you will at minimum be able to attach annotations to a tweet and consume annotations from a tweet's payload via the REST API. Of course it would be awesome to be able to say to search or the streaming API, "give me all tweets with this namespace", or "give me all tweets with this namespace and key", or etc. We're working with the Search, Streaming and other teams to make all this happen. We can't promise it'll be ready by launch but we know it's killer and a must have and are trying to get it ready soon.

* When is it going to launch?

This is, pretty much, the only thing a couple of us are going to be working on until it's launched. We really can't wait to get it in your hands to see all the cool things you'll do with it, so we're cranking to get it out as soon as possible. If I had to provide a guestimate, I'd wave my hands in the direction of 2 months for a early, incremental roll out. We not only need to implement all the functionality, but we also need to productionize it in a measured and responsible way to ensure its quality of service is high.

In closing:

We're really excited about Annotations. Annotations mark one of our first of many departures from keeping in lock step with features on the web site. To truly be a platform, we want to expose high-leverage general purpose utility for the developer community to innovate on top of. Annotations is just the first of several high-leverage-general-purpose-utlity features we're hoping to get to after Annotations.

Think big. Blow our minds.

Marcel Molina
Twitter Platform Team

Twitter API documentation and resources: http://apiwiki.twitter.com
API updates via Twitter: http://twitter.com/twitterapi
Change your membership to this group: http://groups.google.com/group/twitter-api-announce?hl=en
Reply all
Reply to author
0 new messages