CLAVIN doesn't resolve certain location names

44 views
Skip to first unread message

Makiko Shukunobe

unread,
Mar 23, 2016, 4:59:59 PM3/23/16
to clavin-users
Dear Clavin-users,

Recently I have been experiencing a problem that some of location names (e.g Al Zabadani in Syria) are not resolved by CLAVIN.  I examined GeoNames' SY.txt file to find if there is a problem in the file, but I didn't find anything that could potentially cause for the problem. However, CLAVIN resolved location name "Zabadani" when I added it in the alternative name, but interestingly it doesn't work for "Al-Zabadani" or "Al Zabadani".  There are many places having "Al" beginning of place name in Syria and it seems all of them have the same problem.  Is there way to work around for this particular problem?  I would very much appreciate for your help!

The below data was tested
1. Original GeoNames data of Al-Zabadani
172060 Al-Zabadani District Al-Zabadani District Al-Zabadani District,Mantyk Zabdani,Mintaqat az Zabadan,Mintaqat az Zabadani,Minţaqat az Zabadān,Minţaqat az Zabadānī,Qada' az Zabadani,Qaḑā’ az Zabadani,Zebdani,Zebdāni,Zebedani,Zébédâni,mntqt alzbdany,Мантык Забдани,منطقة الزبداني 33.70819 36.11198 A ADM2 SY 08 172060 0 1268 Asia/Damascus 2014-02-15   

2. I added "Zabadani", "Al-Zabadani" and "Al Zabadani" in alternative names, but it only worked for Zabadani.
172060 Al-Zabadani District Al-Zabadani District Al-Zabadani District,Mantyk Zabdani,Mintaqat az Zabadan,Mintaqat az Zabadani,Minţaqat az Zabadān,Minţaqat az Zabadānī,Zabadani,Al-Zabadani,Zabadani,Qada' az Zabadani,Qaḑā’ az Zabadani,Zebdani,Zebdāni,Zebedani,Zébédâni,mntqt alzbdany,Мантык Забдани,منطقة الزبداني 33.70819 36.11198 A ADM2 SY 08 172060 0 1268 Asia/Damascus 2014-02-15  

3. I tried simple data as below, but again only worked for Zabadani not "Al-Zabadani" or "Al Zabadani".
172060 Al-Zabadani Al-Zabadani "Zabadani, Al Zabadani" 33.70819 36.11198 A ADM2 SY 8 172060 0 1268 Asia/Damascus 2/15/2014

Thank you,
Makiko
 

Charlie Greenbacker

unread,
Mar 27, 2016, 8:51:48 AM3/27/16
to Makiko Shukunobe, clavin-users
Hi Makiko,

The issue is that the underlying entity extraction tool used by CLAVIN mischaracterizes names like that. For example, taking the following sample input text (adapted from Wikipedia):

Al-Zabadani is a city and popular hill station in southwestern Syria in the Rif Dimashq Governorate, close to the border with Lebanon. It is located in the center of a green valley surrounded by high mountains at an elevation of around 1,100 m. According to the Syria Central Bureau of Statistics, Al-Zabadani had a population of 26,285 in the 2004 census.

and running it through the online demo for Stanford NER (which is used by CLAVIN-NERD), both mentions of "Al-Zabadani" are incorrectly labeled as a PERSON rather than a LOCATION. The same thing happens if you replace "Al-Zabadani" with similarly named places like Al-Qutayfah or Al-Tall. If Stanford NER (or any other entity extractor you use) gets this wrong, there's no way for CLAVIN to get it right.

You may want to consider posting a question to the Stanford JavaNLP tools users list and ask them why the Stanford NER tool is making this error.

Thanks,
Charlie

--
You received this message because you are subscribed to the Google Groups "clavin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clavin-users...@googlegroups.com.
To post to this group, send email to clavin...@googlegroups.com.
Visit this group at https://groups.google.com/group/clavin-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/clavin-users/1b773afb-9617-40b2-9137-546923e53443%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Makiko Shukunobe

unread,
Mar 28, 2016, 9:52:31 AM3/28/16
to Charlie Greenbacker, clavin-users
Hi Charlie,

Thank you so much for your response!!  After posting the problem, I realized about Stanford NER and tested it online (http://nlp.stanford.edu:8080/ner/).  You are absolutely right.  Stanford NER identifies "Al Zabadani" as a PERSON.  I will post this question to the Stanford JavaNLP tool users list as you suggested.

Again, thank you so much!!  

Best, 
Makiko
--
Research Associate
NC State University,
Center for Geospatial Analytics
Box 7106, Jordan Hall 5117
Raleigh, NC 27695
Makiko Shukunobe
Reply all
Reply to author
Forward
0 new messages