I'm using the Streaming API support of Twitter4J, and noticed
something strange. According to the Streaming API documentation, "some
UTF-8 keywords will not match correctly- this is a known temporary
defect". In fact, the Streaming API doesn't seem to support UTF-8
keywords at all.
For example, tracking the word "não" ("not" in Portuguese):
1) curl -d track=nao 'http://stream.twitter.com/1/statuses/
filter.json' (no accent)
Doesn't return statuses with the word "não".
2) curl -d track=não 'http://stream.twitter.com/1/statuses/
filter.json' (no encoding)
Doesn't return anything.
3) curl -d track=n%C3%A3o 'http://stream.twitter.com/1/statuses/
filter.json' (UTF-8 encoding)
Doesn't return anything.
4) curl -d track=n%E3o 'http://stream.twitter.com/1/statuses/
filter.json' (ISO-8859-1 encoding)
Works fine!
However, the encoding in the class HttpParameter of Twitter4J is hard-
coded to UTF-8. I understand that's probably the right thing to do,
and this is a problem in the Twitter API itself, but the fact is that
the only way I found to use T4J is modifying the class to encode the
parameters using ISO-8859-1.
Does anybody have an opinion about this ?
Best,
muriloq