Tools to convert large text files to OWL?

644 views
Skip to first unread message

Michael DeBellis

unread,
Aug 5, 2017, 12:33:01 PM8/5/17
to ontolog-forum
I'm going to be getting a very large text file that is currently being used by a project as an "ontology" to organize a bunch of unstructured data: documents, messages, etc. For confidentiality reasons I can't go into the domain. They want to start using a real ontology in OWL. My first thought was that I would just manually create an ontology for them, it's a domain I've worked in before and I thought I could do it pretty quickly. But the number of database schemas is huge in the tens of thousands so I think that's not going to be practical. My first thought is to use Celfie.  But I was wondering if anyone has any other suggestions for tools to convert structured text (I'm assuming it's some XML format but not sure) into an ontology. My target is OWL2 and Protege. Also, they currently use Python for their development and I've been learning Python which seems like it could be a good tool to do this kind of transformation so any Python tools would especially be of interest. Or of course Java. Thanks in advance.  

Michael

David Eddy

unread,
Aug 5, 2017, 1:15:17 PM8/5/17
to ontolo...@googlegroups.com
Michael -

On Aug 05, 2017, at 12:33 PM, Michael DeBellis <mdebe...@gmail.com> wrote:

I'm going to be getting a very large text file

Befitting the ontology topic, you might want to offer some sort of “meaning” to the totally ambiguous, meaningless & context-free term LARGE.


I’m not even remotely an ontologist, but if I were working on a body of text, LARGE would most likely not make it to the controlled vocabulary stage.



Context… I have a quite small customer—10,000 employees—with 50,000 programs in part of their software collection.  They experience that as large.  It’s not.

Obviously there are multiple understandings for what a “program” is.


On this desktop computer, the last time I looked there were more than 3,000,000 files—mostly text I assume, but an jpeg is ultimately only text.

____________________________
David Eddy
Babson Park, MA

W: 781-455-0949


Michael DeBellis

unread,
Aug 7, 2017, 10:12:44 AM8/7/17
to ontolog-forum
David, I had similar thoughts, that for a file that big, most of it will be Abox rather than Tbox. I think most of the data are going to be instances or "flat" data that would be stored in data properties in OWL. 

Michael

Michael DeBellis

unread,
Aug 8, 2017, 11:48:37 AM8/8/17
to ontolog-forum
So I've done the Celfie tutorial. Very cool stuff. But I don't think it will help me in this case. The file I have is JSON Java Script Object Notation. It's a common interchange format for data which is how it's used right now by the application where they want to start using an ontology. If anyone knows any tools or papers about how to convert JSON to OWL please let me know. I'm guessing not but it doesn't hurt to ask. 

Michael DeBellis

unread,
Aug 8, 2017, 2:56:31 PM8/8/17
to ontolog-forum
In case others are interested someone pointed me to this article which is great at setting out the issues of inegrating JSON and RDF:  http://milicicvuk.com/blog/2014/08/26/can-json-and-rdf-be-friends/  

David Whitten

unread,
Aug 8, 2017, 4:45:35 PM8/8/17
to ontolog-forum
What would be a good URL to read that Celfie tutorial?

On Tue, Aug 8, 2017 at 2:56 PM, Michael DeBellis <mdebe...@gmail.com> wrote:
In case others are interested someone pointed me to this article which is great at setting out the issues of inegrating JSON and RDF:  http://milicicvuk.com/blog/2014/08/26/can-json-and-rdf-be-friends/  

--
All contributions to this forum by its members are made under an open content license, open publication license, open source or free software license. Unless otherwise specified, all Ontolog Forum content shall be subject to the Creative Commons CC-BY-SA 4.0 License or its successors.
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-forum+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael DeBellis

unread,
Aug 8, 2017, 4:53:34 PM8/8/17
to ontolog-forum, whi...@netcom.com

Michael DeBellis

unread,
Aug 18, 2017, 11:56:42 AM8/18/17
to ontolog-forum
Just wanted to close this out by saying it turned out to be fairly straight forward to do this. The text file is in JSON format which can easily be parsed to create Java objects and then those Java objects can be read and transformed into Jena OWL axioms. I haven't done that yet though. Its turning out the hard part isn't the technical part of going from JSON to OWL but (shouldn't have been a surprise in hind sight) the fact that the current model for the JSON objects is really different than OWL and figuring out how to map from it into the appropriate OWL objects given all the constraints of existing software is not at all easy. It's been a fun experience so far. The real world is so different than the kind of academic questions we tend to discuss here. Not that there's anything wrong with the academic stuff at all,  I love it and ultimately it provides the foundation for doing the tech stuff the right way but I've been away from real world problems for a while and it's kind of eye opening how far they are from the elegant models we create with tools like OWL and Protege when we can start from scratch. The real world is messy!
Reply all
Reply to author
Forward
0 new messages