Note: Only status ids are set to over-run for now, but message and
user ids may be affected next year.
To handle this, I've made the following changes in the latest version
of JTwitter. These may cause some code to break.
1. Ids are now represented by a Number. This will use BigInteger for
Status.id, and (for now) Long for User.id and Message.id
2. Methods which used to take long parameters, now take a Number.
- If you're passing in a long, this will still work fine, since
Long is a subclass of Number.
- The use of 0 and -1 cannot be used any more to indicate unset
for these parameters. You should use null for an unset parameter.
In your own code, I advise using either Number or BigInteger instead
of Long for ids, as this will make your code future-proof.
You can download the latest JTwitter here:
http://www.winterwell.com/software/jtwitter/jtwitter.jar
Sorry for any inconvenience these changes cause.
Best regards,
- Daniel
On 24 November 2010 15:43, Anton Zeef <anton...@gmail.com> wrote:
> If I read the info from twitter correctly, twitter won't be using the
> full 64 bits for another 69 years, or so.
> In the meantime using java's 64-bit signed longs will be sufficient
> since they are equivalant to unsigned 63-bit numbers and the sign bit
> won't be used.
You're quite right. But there are good reasons to represent Twitter's
signed 64 bit ids as they are.
Comments below...
>
> I read the twitter-development-talk about it here:
> http://groups.google.com/group/twitter-development-talk/browse_thread/thread/6a16efa375532182,
> the post is by Matthew Harris who later on in that thread clarifies:
>> 11) Did you really mean ‘unsigned’ 64bit Integer?
>> Strictly speaking the Snowflake is a signed 64bit long under the hood. That
>> being said, we will never use the negative bit and won’t require the full
>> 64bits for positive numbers for about 69 year
>
> Questions/remarks:
> 1) why then convert now to BigIntegers for status-ids? it sure is the
> theoretically correct way to do,
> but even twitter encourages java-api
> clients to keep using longs (see the above thread)...
The thread says over-flowed Longs will continue to work. That's not
exactly a recommendation to use them. Or perhaps I missed a comment.
A few reasons for switching to BigInteger:
1) Representing a signed 64 bit number as an unsigned 63 bit number
could easily lead to errors. E.g. where someone stores the Long, then
tries to use it elsewhere without correcting for the format.
Formatting issues are a pain. BigInteger avoids them.
2) Anyone sorting tweets by id will hit problems with the overflow.
BigInteger fixes that.
3) 6 months ago, Twitter didn't anticipate moving to signed 64 bit
signed ids. They did so in order to encode some extra information
(separating the servers). Who's to say that future growth won't make
them go beyond that? Adding one more bit of info into the id would
definitively break with Long. Whereas BigInteger is future-proof.
4) You can always downcast into a Long if you want to.
Against that, the reasons for sticking with Long are (I think)
uncompelling. Long is faster and gives more pleasant code if we were
doing arithmetic with these objects, but that's irrelevant here. The
extra memory of a BigInteger is minimal.
> and it's kind of
> incosistent to use Number for ids of both Messages and Users, but
> BigInteger for Status' ids
The Status ids are definitely BigIntegers, whereas the Message and
User ids are currently Longs, but might become BigIntegers in the
future - hence the use of the common super-class Number. However...
> 2) so I'd like to see more remarks for the 'getId()' functions in
> Message, User and Status stating more clearly the precision of the
> number, i.e. to which type it can be safely downcasted (or converted
> with e.g. functions like Number.longValue() etc)
You raise a good point about clarity.
I will change the getId() methods to return either BigInteger or Long,
making it clear what you're getting.
> 3) on the other hand, I do encourage the use of Number since, as you
> say, it's more futureproof
>
> All in all your api is quite handy and very well to use, so even with
> bigintegers we'll still be able to enjoy the benefits,
Thank you.
> but some more
> clarification is requested.
>
> One last thing: I see you using the new "id_str" json-field for
> getting the id of the status, is that to prevent the json parser used
> by jtwitter to bail out on the "id" json-field?
Exactly. It throws an exception if you try to get the id field & it
won't fit cleanly in a Long.
I hope that answers your questions. Let me know what you think.
Best regards,
- Daniel
--
--------------------------------------------------
Daniel Winterstein
Edinburgh
http://winterwell.com http://soda.sh
Good points.
BTW, you can sort by id - they're in the right order within a fraction
of a second. But sorting by time works just as well, if not better
(the Twitter servers do exhibit a bit of timedrift, so the accuracy is
more or less the same).
I chose Number because it's less of a departure from long. A fair
amount of old code will just work without any edits. String is
arguably a better choice for the reasons you give. But I'm loathe to
change the API again, and force people to recode.
2. If you think the API could change later so that user IDs or whatever
> could be a BigInteger instead of Long, then you should probably not return
> Long since that'll break the API later if you change it.
True. But it's possibly better to break the API in a way that causes
compile-time errors, than to have people downcasting Numbers into
Longs - which is currently safe, but could cause runtime errors
(including subtle behavioural ones) if there was a change.
> However, Long can
> represent a lot of users, so I'm not sure this will ever happen.
You should be right. I think it could happen for Messages, but is very
unlikely for Users.
Best regards,
- Daniel