A while ago we let you know about the new Tweet ID generation service
we developed called Snowflake and published the source code so you
could get familiar with how it works. Today, we're announcing that at
10am PDT on Tuesday September 21st, 2010 Snowflake will be in use on
our production systems and that status IDs will no longer be
sequential.
Snowflake still uses 64-bit unsigned integers but instead of being
sequential they will instead be based on time and composed of: a
timestamp, a worker number and a sequence number. For the majority of
you this change will go unnoticed and your applications will continue
to function without the need for any changes. In addition the API is
ready for Snowflake and parameters such as max_id and since_id will
work as expected. Snowflake does mean Tweet IDs will no longer be
useful for data analysis, and things like counting Tweets by
subtracting status IDs will not be possible.
We listened when you told us about sorting Tweets by ID and knew that
we needed to keep the ID roughly sortable. With Snowflake if two
Tweets are posted within 1 second of each other they will be within a
second of each other in the ID space too. This means although Tweets
will no longer be sorted, they will be k-sorted to approximately 1
second.
The key points:
* Status IDs will be unique
* Status IDs will continue to increase - Tweets created later in the
day will have a higher ID that those created in the morning
* Order will be maintained for Tweets allowing you to sort by Status
ID. The accuracy of the sort will be to approximately 1 second,
meaning Tweets created within a second of each other have no order.
* All existing API methods will continue to work the same as before
* Previous status IDs will be unchanged
* There will be a noticeable jump in the numerical value of status IDs
when we change.
You can read more about Snowflake on the Twitter Engineering blog:
http://bit.ly/announcing-snowflake
Best
Matt Harris
Developer Advocate, Twitter
http://twitter.com/themattharris
--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapiIssues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: http://groups.google.com/group/twitter-development-talk?hl=en
Thanks for your questions, i've answered them inline.
> Will the new IDs continue on from the old IDs sequentally? Or will
> they be completely incompatible with the old IDs?
All existing IDs will stay the same. The new IDs will be greater than
the old ones and there is likely to be a gap between the old and new.
I'm not sure what you mean by being incompatible though - an ID is a
unique identifier for an object and the new IDs will continue to be
unique identifiers.
> I have a database of several million tweets that JournoTwit users use.
> I don't want to have to start differentiating between two ID types and/
> or having to completely clear the database out.
You won't have to clear your database out or change the the datatype.
The new status IDs are still 64bit integers, with newer Tweets having
numerically higher IDs.
> What I haven't seen amongst any of this documentation - is an example
> of the new status ids in comparison to the old? That would probably
> answer a few questions :D And I assume - Direct Messages will be
> undergoing the same transformation?
The new status IDs apply to Tweets, ReTweets and Mentions (so
basically anything that can show up in the home timeline).
The code that generates the IDs has been shared and you can read more
about Snowflake from our engineering team on their blog.
http://bit.ly/announcing-snowflake
Hope that answers your questions,
Matt
--