Breaking change: Twitter ids are getting too long

17 views
Skip to first unread message

Daniel Winterstein

unread,
Oct 27, 2010, 7:11:09 AM10/27/10
to jt...@googlegroups.com
Twitter have recently changed how they generate status id numbers.
They will use all 64 bits of an *unsigned* long. The Java long is
always signed. This means that if you use a long for status ids, the
id will wrap round into negative numbers. This will happen before the
end of the year, and possibly considerably sooner.

Note: Only status ids are set to over-run for now, but message and
user ids may be affected next year.

To handle this, I've made the following changes in the latest version
of JTwitter. These may cause some code to break.

1. Ids are now represented by a Number. This will use BigInteger for
Status.id, and (for now) Long for User.id and Message.id
2. Methods which used to take long parameters, now take a Number.
- If you're passing in a long, this will still work fine, since
Long is a subclass of Number.
- The use of 0 and -1 cannot be used any more to indicate unset
for these parameters. You should use null for an unset parameter.

In your own code, I advise using either Number or BigInteger instead
of Long for ids, as this will make your code future-proof.

You can download the latest JTwitter here:
http://www.winterwell.com/software/jtwitter/jtwitter.jar
Sorry for any inconvenience these changes cause.

Best regards,
- Daniel

Anton Zeef

unread,
Nov 24, 2010, 10:43:40 AM11/24/10
to JTwitter
If I read the info from twitter correctly, twitter won't be using the
full 64 bits for another 69 years, or so.
In the meantime using java's 64-bit signed longs will be sufficient
since they are equivalant to unsigned 63-bit numbers and the sign bit
won't be used.

I read the twitter-development-talk about it here:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/6a16efa375532182,
the post is by Matthew Harris who later on in that thread clarifies:
> 11) Did you really mean ‘unsigned’ 64bit Integer?
> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
> being said, we will never use the negative bit and won’t require the full
> 64bits for positive numbers for about 69 year

Questions/remarks:
1) why then convert now to BigIntegers for status-ids? it sure is the
theoretically correct way to do, but even twitter encourages java-api
clients to keep using longs (see the above thread)... and it's kind of
incosistent to use Number for ids of both Messages and Users, but
BigInteger for Status' ids
2) so I'd like to see more remarks for the 'getId()' functions in
Message, User and Status stating more clearly the precision of the
number, i.e. to which type it can be safely downcasted (or converted
with e.g. functions like Number.longValue() etc)
3) on the other hand, I do encourage the use of Number since, as you
say, it's more futureproof

All in all your api is quite handy and very well to use, so even with
bigintegers we'll still be able to enjoy the benefits, but some more
clarification is requested.

One last thing: I see you using the new "id_str" json-field for
getting the id of the status, is that to prevent the json parser used
by jtwitter to bail out on the "id" json-field?

Best regards,

Anton Zeef

On Oct 27, 12:11 pm, Daniel Winterstein <daniel.winterst...@gmail.com>
wrote:

Daniel Winterstein

unread,
Nov 24, 2010, 11:29:59 AM11/24/10
to jt...@googlegroups.com
Hi Anton,

On 24 November 2010 15:43, Anton Zeef <anton...@gmail.com> wrote:
> If I read the info from twitter correctly, twitter won't be using the
> full 64 bits for another 69 years, or so.
> In the meantime using java's 64-bit signed longs will be sufficient
> since they are equivalant to unsigned 63-bit numbers and the sign bit
> won't be used.

You're quite right. But there are good reasons to represent Twitter's
signed 64 bit ids as they are.
Comments below...

>
> I read the twitter-development-talk about it here:
> http://groups.google.com/group/twitter-development-talk/browse_thread/thread/6a16efa375532182,
> the post is by Matthew Harris who later on in that thread clarifies:
>> 11) Did you really mean ‘unsigned’ 64bit Integer?
>> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
>> being said, we will never use the negative bit and won’t require the full
>> 64bits for positive numbers for about 69 year
>
> Questions/remarks:
> 1) why then convert now to BigIntegers for status-ids? it sure is the
> theoretically correct way to do,

> but even twitter encourages java-api
> clients to keep using longs (see the above thread)...

The thread says over-flowed Longs will continue to work. That's not
exactly a recommendation to use them. Or perhaps I missed a comment.

A few reasons for switching to BigInteger:
1) Representing a signed 64 bit number as an unsigned 63 bit number
could easily lead to errors. E.g. where someone stores the Long, then
tries to use it elsewhere without correcting for the format.
Formatting issues are a pain. BigInteger avoids them.
2) Anyone sorting tweets by id will hit problems with the overflow.
BigInteger fixes that.
3) 6 months ago, Twitter didn't anticipate moving to signed 64 bit
signed ids. They did so in order to encode some extra information
(separating the servers). Who's to say that future growth won't make
them go beyond that? Adding one more bit of info into the id would
definitively break with Long. Whereas BigInteger is future-proof.
4) You can always downcast into a Long if you want to.

Against that, the reasons for sticking with Long are (I think)
uncompelling. Long is faster and gives more pleasant code if we were
doing arithmetic with these objects, but that's irrelevant here. The
extra memory of a BigInteger is minimal.

> and it's kind of
> incosistent to use Number for ids of both Messages and Users, but
> BigInteger for Status' ids

The Status ids are definitely BigIntegers, whereas the Message and
User ids are currently Longs, but might become BigIntegers in the
future - hence the use of the common super-class Number. However...

> 2) so I'd like to see more remarks for the 'getId()' functions in
> Message, User and Status stating more clearly the precision of the
> number, i.e. to which type it can be safely downcasted (or converted
> with e.g. functions like Number.longValue() etc)

You raise a good point about clarity.
I will change the getId() methods to return either BigInteger or Long,
making it clear what you're getting.

> 3) on the other hand, I do encourage the use of Number since, as you
> say, it's more futureproof
>
> All in all your api is quite handy and very well to use, so even with
> bigintegers we'll still be able to enjoy the benefits,

Thank you.

> but some more
> clarification is requested.
>
> One last thing: I see you using the new "id_str" json-field for
> getting the id of the status, is that to prevent the json parser used
> by jtwitter to bail out on the "id" json-field?

Exactly. It throws an exception if you try to get the id field & it
won't fit cleanly in a Long.

I hope that answers your questions. Let me know what you think.

Best regards,
- Daniel

--
--------------------------------------------------
Daniel Winterstein
Edinburgh
http://winterwell.com   http://soda.sh

Dobes Vandermeer

unread,
Nov 24, 2010, 12:01:18 PM11/24/10
to jt...@googlegroups.com
Thoughts:

1. Why even use a Number, why not a String?  If the ids now encode extra information, sorting by ID isn't useful since we don't know what ordering that is; sorting by timestamp will always be better.  The data is going to be converted to/from a String anyway during parsing and serialization so this will be faster, and it will correctly encode the opaque nature of the ID.  It's future proof (twitter can use whatever ID scheme they want with as many bits as they want) and doesn't give people they impression that the could or should perform arithmetic or sorting based on the ID.
2. If you think the API could change later so that user IDs or whatever could be a BigInteger instead of Long, then you should probably not return Long since that'll break the API later if you change it.  However, Long can represent a lot of users, so I'm not sure this will ever happen.  I suggest using String in this case, too, since (again) IDs should be considered opaque identifiers anyway.

Anton Zeef

unread,
Nov 24, 2010, 12:41:29 PM11/24/10
to JTwitter
Thanks for the quick and sound reply.
I second the change in return values for getId() methods to either
BigInter or Long, and as you're an implementer of the twitter-api I'd
love to see the twitter-api specifics being more documented in the
code, but I know we're all stressed on time ;)

Anyway, the reasons for switching to biginteger are more than clear
now, so I'll be looking forward to your next release

regards,
Anton

On 24 nov, 17:29, Daniel Winterstein <daniel.winterst...@gmail.com>
wrote:
> Hi Anton,
>
> On 24 November 2010 15:43, Anton Zeef <anton.z...@gmail.com> wrote:
>
> > If I read the info from twitter correctly, twitter won't be using the
> > full 64 bits for another 69 years, or so.
> > In the meantime using java's 64-bit signed longs will be sufficient
> > since they are equivalant to unsigned 63-bit numbers and the sign bit
> > won't be used.
>
> You're quite right. But there are good reasons to represent Twitter's
> signed 64 bit ids as they are.
> Comments below...
>
>
>
> > I read the twitter-development-talk about it here:
> >http://groups.google.com/group/twitter-development-talk/browse_thread...,

Daniel Winterstein

unread,
Nov 24, 2010, 2:48:28 PM11/24/10
to jt...@googlegroups.com
Hi Dobes,

Good points.

BTW, you can sort by id - they're in the right order within a fraction
of a second. But sorting by time works just as well, if not better
(the Twitter servers do exhibit a bit of timedrift, so the accuracy is
more or less the same).

I chose Number because it's less of a departure from long. A fair
amount of old code will just work without any edits. String is
arguably a better choice for the reasons you give. But I'm loathe to
change the API again, and force people to recode.

2. If you think the API could change later so that user IDs or whatever
> could be a BigInteger instead of Long, then you should probably not return
> Long since that'll break the API later if you change it.

True. But it's possibly better to break the API in a way that causes
compile-time errors, than to have people downcasting Numbers into
Longs - which is currently safe, but could cause runtime errors
(including subtle behavioural ones) if there was a change.

> However, Long can
> represent a lot of users, so I'm not sure this will ever happen.

You should be right. I think it could happen for Messages, but is very
unlikely for Users.

Best regards,
- Daniel

Reply all
Reply to author
Forward
0 new messages