Annotating Text in DiscourseDB: RDF vs UIMA SQL

31 views
Skip to first unread message

Carolyn Rose

unread,
Apr 29, 2015, 4:25:20 PM4/29/15
to dance...@googlegroups.com
We received some feedback on the original DiscourseDB design specifically questioning the usage of UIMA SQL for annotation of text.  I am including a pointer to the original document here for your convenience:


Here are the specific concerns that were raised:

1.There was some concern also about how we were weaving together the UIMA work to allow for unstructured data in
the MySQL database. One person mentioned that this is a hard task inherently, and someone else worried that query
writing could be painful.
2. Someone specifically had this point in relation "Their underlying look at the data seems closer to a graph data store
like RDF (Resource Description Framework), in that it consists of multiple and potentially-arbitrary relationships
between entities.  To me, this might indicate that an RDF style DB and queries based on SPARQL might be more

We're considering the trade offs and would welcome members of the community to share their insights and opinions.  We're looking at the following resources to familiarize ourselves with RDF:


We will continue to collect resources, to read more about this issue, and think about trade offs.  Thanks in advance for your input.

Carolyn

imr...@gmail.com

unread,
Apr 30, 2015, 9:39:39 AM4/30/15
to dance...@googlegroups.com
I was wondering about another detail in the document you say:  In DiscourseDB, a UIMA Collection is represented by a Discourse Entity in the external discourse structure, while a CASDocument is represented by a Revision entity. The tables CASDocument and Revision as well as Collection and Discourse could be merged, since they represent similar concepts.

I want to ask what is stopping you from these mergers? It seems perhaps more elegant to go ahead with the mergers. Fewer tables will make queries more powerful and easy if the tables are truly redundant.

Phil

oliver....@gmail.com

unread,
May 5, 2015, 6:37:09 PM5/5/15
to dance...@googlegroups.com
Hi Phil,
I fully agree that redundancy should be minimized. In the initial stage of the development, I wanted to incorporate the existing UIMA-SQL scheme (as one possible representation for the micro structure) without any changes to the original model. We are still discussing if that even is the best representation and what the trade-offs are. Once decided, the interfaces can be consolidated in a way that redundancy is minimized. 

Oliver
Reply all
Reply to author
Forward
0 new messages