Annotating Text in DiscourseDB: RDF vs UIMA SQL

Carolyn Rose

unread,

Apr 29, 2015, 4:25:20 PM4/29/15

to dance...@googlegroups.com

We received some feedback on the original DiscourseDB design specifically questioning the usage of UIMA SQL for annotation of text. I am including a pointer to the original document here for your convenience:

DiscourseDB overview document

https://docs.google.com/document/d/1OoT3lA-JiL5NKtZ_iuXxR7SD6MjQGX2rgzxTXQKDVlU

DiscourseDB ER diagram draft v0.1

https://drive.google.com/file/d/0BwCBBnex0H4AWWZCRlBBODEteWM

Here are the specific concerns that were raised:

1.There was some concern also about how we were weaving together the UIMA work to allow for unstructured data in

the MySQL database. One person mentioned that this is a hard task inherently, and someone else worried that query

writing could be painful.

2. Someone specifically had this point in relation "Their underlying look at the data seems closer to a graph data store

like RDF (Resource Description Framework), in that it consists of multiple and potentially-arbitrary relationships

between entities. To me, this might indicate that an RDF style DB and queries based on SPARQL might be more

efficient in the long term (http://www.cambridgesemantics.com/semantic-university/sparql-vs-sql-intro

[www.cambridgesemantics.com])."

We're considering the trade offs and would welcome members of the community to share their insights and opinions. We're looking at the following resources to familiarize ourselves with RDF:

http://www.w3.org/RDF/

http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf

We will continue to collect resources, to read more about this issue, and think about trade offs. Thanks in advance for your input.

Carolyn

imr...@gmail.com

unread,

Apr 30, 2015, 9:39:39 AM4/30/15

to dance...@googlegroups.com

I was wondering about another detail in the document you say: In DiscourseDB, a UIMA Collection is represented by a Discourse Entity in the external discourse structure, while a CASDocument is represented by a Revision entity. The tables CASDocument and Revision as well as Collection and Discourse could be merged, since they represent similar concepts.

I want to ask what is stopping you from these mergers? It seems perhaps more elegant to go ahead with the mergers. Fewer tables will make queries more powerful and easy if the tables are truly redundant.

Phil

oliver....@gmail.com

unread,

May 5, 2015, 6:37:09 PM5/5/15

to dance...@googlegroups.com

Hi Phil,

I fully agree that redundancy should be minimized. In the initial stage of the development, I wanted to incorporate the existing UIMA-SQL scheme (as one possible representation for the micro structure) without any changes to the original model. We are still discussing if that even is the best representation and what the trade-offs are. Once decided, the interfaces can be consolidated in a way that redundancy is minimized.

Oliver

Reply all

Reply to author

Forward