Version 1.7

39 views
Skip to first unread message

Mena Badieh Habib Morgan

unread,
Feb 21, 2014, 4:09:25 AM2/21/14
to micropo...@googlegroups.com
Hello,
The version 1.7 of the training set have some malformed tweets.
Ex: 91640162563526656 has spaces instead of tab between the tweet text and the first mention '8'
91936480318078976 the same problem
91935825989873664
91966280847990785
91967052599934977
92121125789769728 
& Many more

Furthermore, I can see that you decided to annotate numbers in this version. The collection now contains a lot of mentions which represent numbers which can be easily detected by regular expression. So I can see no real added value.

Thanks for your efforts.

Mena

Fréderic Godin

unread,
Feb 21, 2014, 4:12:39 AM2/21/14
to micropo...@googlegroups.com
I think in this case, completeness is the added value, given that the taxonomy contained the type 'Amount'?
However, I think these changes give us more work to cover...

Best,

Frederic




--
You received this message because you are subscribed to the Google Groups "microposts2014" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microposts201...@googlegroups.com.
Visit this group at http://groups.google.com/group/microposts2014.
For more options, visit https://groups.google.com/groups/opt_out.

Mena Badieh Habib Morgan

unread,
Feb 21, 2014, 7:12:05 AM2/21/14
to micropo...@googlegroups.com
I agree with you, but if you counter the numbers in the training set you will find around 376 number .. this is around 10% of the mentions. It is a big number I guess.
Anyway It is just a note .. I don't mind to have 300+ correct annotations for free :)

Mena

Fréderic Godin

unread,
Feb 21, 2014, 7:42:25 AM2/21/14
to micropo...@googlegroups.com
Indeed, I have to agree with you too..
I did not implement this yet, so maybe far more better solutions will have lower F1 scores just because they did not annotate the numbers...


--

Mena Badieh Habib Morgan

unread,
Feb 21, 2014, 7:49:22 AM2/21/14
to micropo...@googlegroups.com
This is exactly my point of view

Mena

On Friday, 21 February 2014 10:09:25 UTC+1, Mena Badieh Habib Morgan wrote:

Ugo Scaiella

unread,
Feb 21, 2014, 10:54:03 AM2/21/14
to micropo...@googlegroups.com
I think that the major issue is with spot 'one', is to be annotated or not?

91973714199052288 "Don't be in such a rush to get to the top.. Enjoy your journey.. Take it slow. One step at a time. Appreciate the scenery on the way."
92765895998451712 "one of the best feeling in the world is getting over ur ex...rt if u agree"
91960773986889728 "RT @ChasingJason: One of the worst things to do is make a woman cry, especially your mother." One http://dbpedia.org/resource/1_(number)
92792365793808385 "This guy is such a d-bag, one of the easiest people to hate in the whole world" one http://dbpedia.org/resource/1_(number)
91850107737227265 "Jesus healed 10 lepers, only one returned to say thank you and give God glory. Only one. So only expect 10% of the ppl u help 2 be thankful" Jesus http://dbpedia.org/resource/Jesus 10 http://dbpedia.org/resource/10_(number) God http://dbpedia.org/resource/God

and this is an error, I think...
92365972840775680 "Pedigree and genetics conference scheduled for Sept. 7-8 - Paulick Report: Pedigree and genetics conference sche... http://bit.ly/oOBsVN" Pedigree http://dbpedia.org/resource/Pedigree_chart genetics http://dbpedia.org/resource/Genetics Sept. http://dbpedia.org/resource/September 7 http://dbpedia.org/resource/7_(number) 8 http://dbpedia.org/resource/8_(number) Pedigree http://dbpedia.org/resource/Pedigree_chart


A part from that, I have noticed that several lines in the v1.7 are malformed

91640162563526656
91966280847990785
92121125789769728
91967052599934977
92357943168741376
93150911693717505
93178533135925249

-- Ugo
Reply all
Reply to author
Forward
0 new messages