Tom
>>> ID value. If this happens in your parser you will need to �pre-parse� the
>>> data, removing or replacing ID parameters with their _str versions.
>>
>>> Summary
>>> -------------
>>> 1) If you develop in Javascript, know that you will have to update your code
>>> to read the string version instead of the integer version.
>>
>>> 2) If you use a JSON decoder, validate that the example JSON, above, decodes
>>> without throwing exceptions. If exceptions are thrown, you will need to
>>> pre-parse the data. Please let us know the name, version, and language of
>>> the parser which throws the exception so we can investigate.
>>
>>> Timeline
>>> -----------
>>> by 22nd October 2010 (Friday): String versions of ID numbers will start
>>> appearing in the API responses
>>> 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
>>> length
>>> 26th November 2010 (Friday) : Status IDs will break 53bits in length and
>>> cease being usable as Integers in Javascript based languages
>>
>>> We understand this isn�t as seamless a transition as we had planned and
>>> appreciate for some of you this change requires an update to your code.
>>> We�ve tried to give as much time as possible for you to make the migration
Tom
>>> ID value. If this happens in your parser you will need to �pre-parse� the
>>> data, removing or replacing ID parameters with their _str versions.
>>
>>> Summary
>>> -------------
>>> 1) If you develop in Javascript, know that you will have to update your code
>>> to read the string version instead of the integer version.
>>
>>> 2) If you use a JSON decoder, validate that the example JSON, above, decodes
>>> without throwing exceptions. If exceptions are thrown, you will need to
>>> pre-parse the data. Please let us know the name, version, and language of
>>> the parser which throws the exception so we can investigate.
>>
>>> Timeline
>>> -----------
>>> by 22nd October 2010 (Friday): String versions of ID numbers will start
>>> appearing in the API responses
>>> 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
>>> length
>>> 26th November 2010 (Friday) : Status IDs will break 53bits in length and
>>> cease being usable as Integers in Javascript based languages
>>
>>> We understand this isn�t as seamless a transition as we had planned and
>>> appreciate for some of you this change requires an update to your code.
>>> We�ve tried to give as much time as possible for you to make the migration
--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Java doesn't have unsigned types, so a (signed) long is the only way to transfer the value.
IIRC from peeking at that code, the top bit is unused, which would mean there's no danger of creating an id value that's ambiguous. Storing and comparing ids in signed or unsigned 64-bit longs should be fine.
-- brion vibber (brion @ status.net)
As far as I know, this issue will only cause trouble for a few
applications that work with JavaScript and depend on the IDs a lot.
My suggestion to solve this issue would be to introduce an additional
parameter (just like include_rts, just with a different name) that turns
all IDs into strings. No extra fields, just an additional optional
parameter. Won't cause trouble for the applications that can't parse it
and requires minimal implementation effort for developers.
I hope I'm not too late with my suggestion :-)
Tom
On 10/19/10 7:10 PM, Craig Hockenberry wrote:
> This approach feels wrong to me. The red flag is the duplication of
> data within the payload: in 30+ years of professional development,
> I've never seen that work out well.
>
> The root of the problem is that you've chosen to deliver data in a
> format (JSON) that can't support integers with a value greater than
> 2^53 bits. And some of your data uses 2^64 bits.
>
> The result is that you're working around the problem in a language by
> using a string. Avoiding the root problem will encumber you with
> legacy that you'll regret later.
>
> Look at your proposed solution from a different point-of-view: say you
> have a language that can't handle Unicode well (e.g. BASIC or Ruby.)
> Would you solve this problem by adding another field called
> "text_ascii"?
>
> "text": "@themattharris hey how are things in K�benhavn?".
>> ID value. If this happens in your parser you will need to �pre-parse� the
>> data, removing or replacing ID parameters with their _str versions.
>>
>> Summary
>> -------------
>> 1) If you develop in Javascript, know that you will have to update your code
>> to read the string version instead of the integer version.
>>
>> 2) If you use a JSON decoder, validate that the example JSON, above, decodes
>> without throwing exceptions. If exceptions are thrown, you will need to
>> pre-parse the data. Please let us know the name, version, and language of
>> the parser which throws the exception so we can investigate.
>>
>> Timeline
>> -----------
>> by 22nd October 2010 (Friday): String versions of ID numbers will start
>> appearing in the API responses
>> 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit
>> length
>> 26th November 2010 (Friday) : Status IDs will break 53bits in length and
>> cease being usable as Integers in Javascript based languages
>>
>> We understand this isn�t as seamless a transition as we had planned and
>> appreciate for some of you this change requires an update to your code.
>> We�ve tried to give as much time as possible for you to make the migration
So - for Twitter: what is your *realistic* projection for when a
53-bit integer ID will overflow? What are the underlying assumptions
about human population growth, spread of Twitter, revenue models,
competition, etc.? I know this is all highly confidential, so for sake
of argument, assume current tweet rates per user and the goal your
executives have stated of a billion users, with a plateau at that
point. The question I'm asking is whether you *really* need 64-bit
integer IDs for tweets or for users. ;-)
By the way, I ask similar questions of all the "big data" geeks out
there - so many naked emperors, so little time. ;-)
--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb
"A mathematician is a device for turning coffee into theorems." - Paul Erdos
Quoting Craig Hockenberry <craig.ho...@gmail.com>:
Chance of it actually reaching 53 bits? I'd say that it happens at the
end of November... Friday the 26th?
Tom
Interesting ... so you have the theoretical capacity to scale to 2**22
(about 4 million) tweets per millisecond? Even 4 million tweets a
second seems unrealistic, as does a single "machine" only being able
to generate 4096 IDs. I think if you're really expecting this kind of
volume, the FPGA vendors probably can help you out. We are talking
clocks and counters, here, right, not Javascript interpreters or
robust linear regressions? ;-)
Ah, well, I'll check back on you guys in 69 years to see how you're
holding up. ;-)
I'd say that you could remove a maximum of 2 bits from the time - this
would divide the 69 years by 4, making it a max of 17 years. By then,
I'd assume that we are past the 53-bit limit.
You could remove 3 bits from the machine ID and 5 bits from the sequence
number. It would mean that there could be only 128 ID servers with 128
IDs per millisecond per machine -> 16 million tweets per second.
In total you would have removed 10 bits from a number that had only 63
bits -> 53 bits. The question is: do you want that? I don't think you
do. I really prefer the current solution.
Tom
Tom
(64 * log(2)) / log(62) = 10.7487219
eleven characters drawn from A-Za-z0-9
and they can still be sortable!
2) No
Tom
>> Users may change to a Snowflake ID scheme in the future but this isn�t
>> mean we couldn�t scale Twitter, or operate our infrastructure in an
>> uncoordinated high-available way.
>>
>> 7) When will the 53bit Integer overflow happen?
>> 24 days after Snowflake starts counting.
>>
>> 8) Is it safe to parse and store IDs as signed 64bit Integers?
>> Yes.
>>
>> 9) Why offer both the String and Integer versions of the ID?
>> The String representation is needed to ensure languages which cannot convert
>> the >53bit Integer can still use the ID in other API requests.
>>
>> The Integer value is being retained for languages which can handle numbers>53bit and to prevent applications which have not converted from being
>>
>> cut-off from Twitter.
>>
>> 10) When ID is null what will the _str representation be?
>> The _str representation will also be null.
>>
>> 11) Did you really mean �unsigned� 64bit Integer?
>> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
>> being said, we will never use the negative bit and won�t require the full
>> 64bits for positive numbers for about 69 years:
>>
>> http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2...
>>
>> 12) Why not make the strings opt-in?
>> We did consider this as an option but decided against it for a number of
>> reasons. The first reason is that the ID is fundamental to being able to
>> work with the data from the API so receiving the correct ID shouldn�t be
>> read more �
>