I'm using the primary tags and hash tag regex from Kate. I have a
simple cron job running every minute searching for any of the main
tags, then I'm filtering on the regex to decide if it should be stored
or not.
Lots of #sebusca tweets, I'd like to try and modify the code to pull
those sections out, perhaps identify names, etc. and aggregate
somewhere on the site. The top needs, etc. are not correct as I
haven't had a chance to add in the various spanish tags into the
ontology, I will look at the google docs page for that tonight and try
to clean that up too.
Any comments or suggestions welcome, if there are specific feeds or
other ways to slice and dice the data that would be useful to anyone,
please let me know and I'll see what I can do.
Simon.
Right now it captures two logs. One is everything tagged with #chile,
#fuerzachile or #terremotochile, The second is everything that has one
of the above hashtags *and* one of the tweaks defined in the Google
Doc. The both are in JSON. The first one gets compressed with BZIP2
every hour, but the second one stays uncompressed.
--
M. Edward (Ed) Borasky
borasky-research.net/m-edward-ed-borasky/
"A mathematician is a device for turning coffee into theorems." ~ Paul Erd?s
If you want to post things there, use this link:
http://www.dzone.com/links/add.html
You'll need to have an account, and I think "new" accounts get
"moderated", for some definition of the two words in quotes. But
they've been taking everything I post, so at worst, you can send me
links and I'll post them.
--
M. Edward (Ed) Borasky
borasky-research.net/m-edward-ed-borasky/
"A mathematician is a device for turning coffee into theorems." ~ Paul Erd?s