FAQ XML, semantic web and proteus

3 views

Skip to first unread message

Davide Del Vento

unread,

Apr 4, 2011, 3:34:46 PM4/4/11

to slipstream-pro...@googlegroups.com

This has been covered (at least partially) on the website, but I think
it's appropriate to repeat it here.
What is the relationship (if any) between proteus and XML, the
semantic web and projects like freebase, DBpedia and metaweb?

What's about what's described in the Chapters 8, 11, 12, and 13 of
this book: http://www.elsevierdirect.com/companion.jsp?ISBN=9780123815415
?

Bruce Long

unread,

Apr 5, 2011, 2:03:29 PM4/5/11

to slipstream-pro...@googlegroups.com, Davide Del Vento

XML is a modeling language in the sense that it can represent data and schema in an object-oriented way. But there are several problems with XML that seem to me to be the fatal problems that have kept XML from becoming as important as it could be. Certainly they are problems that make it unsuitable for the Slipstream. Small problems are that it is awkward with binary data, it's not, by some standards, human-readable, and by the time you include all the parsers, schema languages, and ad-hoc techniques needed to use it, the language is large enough that it isn't worth the effort for the average person to learn it.

A larger problem with XML is that it cannot represent objects and systems of objects with enough resolution that they can be implemented or that their causal structure can be reasoned about. C++ can represent objects and classes with enough resolution that they can be implemented. Haskell represents systems in a way that they can be reasoned about. For the Slipstream we need both of these features.

Another concept for improving the Web was the "Semantic Web" which was to use XML and OWL to specify the semantics of all the data on the web. I sincerely respect the folks working on the Semantic Web and I wish them success. But for my purposes the languages won't work. And I believe that the problems are why the Semantic Web hasn't gone viral and become the standard way things are done.

One of the problems is that, like XML, OWL is big, complex, and hard to learn. HTML is easy so it could catch on. But there are two bigger problems. Sociological problems. The amount of work it takes to represent your information in OWL is an order of magnitude greater than the relatively small benefit gained from its use. Secondly, The vision of the Semantic Web requires that everyone buys in to it. That's too much to ask. There are a lot of different types of people in the World and the requirement that everyone agrees to use this complex language just isn't going to happen. With Proteus a model can be wrapped around a data-base or a propriety file format or API. For example, a Proteus model of OWL and XML could be made; then folks who prefer those languages can use them without harassment. The other way round will not work because the semantics of Proteus are not expressible in OWL or XML.

That brings up the larger reason for not using OWL. OWL models represent the World much like the Philosopher Aristotle would have: "All DOGS are MAMMAL; Fido is a DOG; therefore Fido is MAMMAL." Aristotle is my favorite philosopher so I'm not dissing the guy. But Galileo, and then Newton and most of science came up with a better way: numbers. Proteus uses the concept of "pieces of information" or "infons" to represent systems. Infons are like a digital representation of a state-system and they have so much in common with numbers that it's sometimes tempting to just use the term number instead of infon.

DBpedia and Freebase are great projects that I applaud. DBpedia uses RDF which is essentially OWL. Freebase is proprietary and wouldn't like users using it for things like their personal recipe list or friend list or to-do lists. The Slipstream will be a distributed repository that never needs commercial support and will not need X-millions of dollars each year to pay for server space the way Wikipedia or other central-server based projects will.

As for the various projects in the book you mention ("No Code Required"), these examples illustrate perfectly how even single models of things can be used to increase user power. If you look at the examples in each chapter, the researcher has created a model of some services or web-sites or data and has provided an interface that lets you do cool, useful things with the models. The idea for the Slipstream is that we can make all these models compatible with each other, share them, and provide any kind of user interface users prefer for interacting with them. For example, in one of the chapters a researcher has essentially modeled various web-sites and shows how a user can drag, drop, and draw in order to make a system where looking up a restaurant will automatically bring up bus schedule information to that restaurant. With the Slipstream, people in each city could model their city's transportation system: roads, buses, trains, taxis, etc. Most likely, the organizations themselves would model themselves. They could include real-time changes such as accounting for traffic-jams, vehicle problems, etc. If this information is already on the web a wrapper model would be easy to create. Users could use interfaces that they already understand to query for things like "all sushi places within 20 minutes from where Fred, Joy and I currently are".

Here are some principles extracted from the above:
* The system should eventually be so easy that it is chosen by default by lazy people.
* Content and models should be easy to share with trusted people.
* No central servers requiring maintenance and money for upkeep should be required.
* Models should have enough resolution that they can be implemented (as with C++ or Java but not UML or XML).
* Models should be represented in such a way that reasoning about their properties and causal structure is possible (as with Haskell or set theory)
* The system should work even if people don't use Proteus much because Proteus can wrap the semantics of external data sets. It should come to users' culture not force users to adopt a standard culture.
* New features should be added by generalizing and making the language smaller and simpler rather than by hacking new functionality on ad-hoc making the language larger.