To Schema or not to Schema

Reuven Cohen

unread,

Nov 26, 2008, 10:02:41 AM11/26/08

to cloud...@googlegroups.com

My post the other day about creating an XMPP based unified cloud interface has generated a lot of interest (Thanks you Dave @ Cnet). One point that has been mentioned by several people is in regards to the proposed usage of a XML schema and whether a predefined model makes any sense. A few of you also said to look at a more "RESTful" architecture, which in my opinion is not mutually exclusive to a XML schema. Several have pointed me to the SNMP protocol and its object model, as a good example. SNMP uses a strict verb discipline in tandem to the protocol's small operator set, and the 'resources' are addressed with a uniform global scheme of Object identifiers.

Yet another suggestion was to look at the Resource Description Framework (RDF) as the basis for UCI. What I find interesting about the RDF data model is it based upon the idea of making statements about Web resources in the form of subject-predicate-object expressions. I also found it's use of statement reification and context possibily very useful. Although RDF brings us back to the usage of schemas.

So I'd like to propose a question to the group. If not a traditional XML Schema, what other approachs may give us equal or greater flexibility?

--
--

Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
blog > www.elasticvapor.com
-
Open Source Cloud Computing > www.enomaly.com

Krishna Sankar (ksankar)

unread,

Nov 26, 2008, 10:51:16 AM11/26/08

to cloud...@googlegroups.com

A light weight RDF and OWL should work here. There are at least three vectors – the representation, exchange semantics and a processing model. If we keep them lightweight, OWL-Lite should work; I do not think we need the decidability and completeness. Moreover, we are aiming at capturing the relationships between VMs, attributes and related artifacts. In the worst case, we can always use microformats (which works well with XMPP)rather than a mungo XML Schema.

Just m 0.02$

Cheers

<k/>

Matthew Zito

unread,

Nov 26, 2008, 11:55:52 AM11/26/08

to cloud...@googlegroups.com

I don’t see anything wrong with the ideas of schemas fundamentally, though I think one of the challenges that we’ll find is that with such a broad focus of this group – focusing on lots of very different types of cloud infrastructures – keeping schemas from becoming very unwieldy will be a challenge.

I think there’s three main considerations:

- Readability/specificity – making sure that whatever is written down is easily understandable, given the wildly disparate models we’re looking at. There’s a lot of open standards that fell apart in related areas (DCML, etc.) due to the weight of the

- Simplicity of technology stack – whatever protocols are used should be easily implemented – something XMPP or REST-driven have lots of libraries that abstract some of the complexity at a protocol level, I know a lot less about dealing with RDF (I realize you could do RDF-over-just-about-anything, I’m just less familiar with RDF). We should keep that in mind to make solutions as easy to implement as possible

- Minimizing required implementation – Along similar lines to the previous two, if there’s one knob that you want to twiddle across disparate environments, you should ideally have to only implement that one knob with a minimum of scaffolding

Based on those considerations, the last one is very well suited for SNMP/object model type activities, but very strict object representations have been shown in SNMP to be very ugly, and also not cleanly handle bi-directional communications, state models, etc. RDF looks elegant, but I would have to know more about the types of activities we’re trying to integrate.

Perhaps a set of simple schemas for different classes of activities might fit in the middle. I think it’ll be important at some point soon to define some user stories & scenarios and try to see which models we might fit within.

Thanks,

Matt

From: cloud...@googlegroups.com [mailto:cloud...@googlegroups.com] On Behalf Of Reuven Cohen
Sent: Wednesday, November 26, 2008 10:03 AM
To: cloud...@googlegroups.com
Subject: To Schema or not to Schema

My post the other day about creating an XMPP based unified cloud interface has generated a lot of interest (Thanks you Dave @ Cnet). One point that has been mentioned by several people is in regards to the proposed usage of a XML schema and whether a predefined model makes any sense. A few of you also said to look at a more "RESTful" architecture, which in my opinion is not mutually exclusive to a XML schema. Several have pointed me to the SNMP protocol and its object model, as a good example. SNMP uses a strict verb discipline in tandem to the protocol's small operator set, and the 'resources' are addressed with a uniform global scheme of Object identifiers.

Chris Marino

unread,

Nov 26, 2008, 12:12:48 PM11/26/08

to cloud...@googlegroups.com

Reuven, I think I need a little more context before I have any real opinion.

What kind of items are going to be exchanged across clouds? Cloud configuration metadata? Application data? Transactional? Application Metadata?

Maybe it would be useful to set out the objectives of the interface before any technologies, techniques, schemas were proposed.

At SnapLogic we had to wrestle with *all* of these issues, and we were focused on only a subset of cloud interoperability (application data). Everything is RESTful, we used RDF for the metadata (before we backed that out due to query/access performance), started with simple tabular data representations, but quickly had to add hierarchies/XML. Handling large data sets is problematic as well, especially for XMPP, so that's a whole other hairball.

There are tons of data center standards that are relevant here as well. CMDB is just one example (http://en.wikipedia.org/wiki/CMDB).

I'm pretty new to this group (and didn't attend the SF event back in Sept), so I may have already been discussed, so sorry in advance if I'm missing something.

CM

-----Original Message-----
From: cloud...@googlegroups.com [mailto:cloud...@googlegroups.com] On Behalf Of Reuven Cohen
Sent: Wednesday, November 26, 2008 7:03 AM
To: cloud...@googlegroups.com
Subject: To Schema or not to Schema

Paul Strong

unread,

Nov 26, 2008, 1:00:44 PM11/26/08

to cloud...@googlegroups.com

Reuven,

As per our group discussion at the meeting hosted by France Telecom,
and echoing a previous post, I believe it is imperative that we
clearly define the scope and make it as tight as possible. There has
been a lot of existing work that is potentially relevant, ranging from
CIM to DCML. There are some newer things around service models,
including the work on SML (Service Modeling Language) and some very
high level work at the OGF in the Reference Model working group (which
I co-chair). Indeed we at eBay are using an ontology (RDF/OWL,
RDF/XML) with a vocabulary based on the OGF Reference Model in one of
our management services. In our case we have kept it very small and
light, and flexible... But I digress.

In order to be successful I believe we must avoid boiling the ocean,
we should leverage what has been done before where appropriate, learn
from near misses and only generate new content where it makes a real
difference to what we are doing and where we are the natural home for
that effort. Making this real will not only involve hard work, but
may require no small amount of collaboration with other groups -
depending on the problems we want to solve.

So, this is all good stuff, but I think clear and well constrained
requirements are a prerequisite to success :o) What critical, cloud
specific interoperability problems do we want to solve, for which
there is/are no other natural home(s)?

Cheers
Paul

--
Paul Strong
email - Paul....@gmail.com
AIM - FractalPaul
Skype - FractalPaul

Tross

unread,

Nov 26, 2008, 1:06:11 PM11/26/08

to Cloud Computing Interoperability Forum

"There are tons of data center standards..."

Classic! The best thing about standards is there're so many there's
bound to be one for everyone :-).

Perhaps I'm a bit jaded, but I've watched too many standards efforts
in this space that didn't really make it. The first is definitely
from the DMTF, CIM and it's related standards WBEM, etc. IMHO, when
two products can "say" they're compliant with a standard yet cannot
interoperate in any way shape or form, then the standard just isn't
providing value. Don't forget the globus folks with their ogsa work.
Then all the WS_* work! WSRF, cute, but if a standard falls in the
forest, does anyone care? Then there's MSFT's attempt at playing nice
with WSRF - they just did their own WS_Man and got others to pick it
up since they _own_ Windows. They also have SML and CML. Personally,
I could care less about SML as it's almost like another XSD - but CML,
on the other hand could be cool, if it catches on of course. I
believe there's some crossover work with the CMDBf and CML working
groups - again nice and stuff, but....

So, like many of the others, I think we need to start with clear
objectives, goals, dreams etc. (aka requirements) before we start
throwing standards around. I suspect Ruv has something in his mind on
how clouds might interop for particular purposes etc. and that lead
him to xmpp - but I have no idea what he's thinking. So how bout it
Ruv? Can you articulate your dream of how all this could fit
together? I'm not even sure which problem you're hoping to solve -
that would be a great starting point for dummies like me ;-)

Tross

On Nov 26, 12:12 pm, "Chris Marino" <christopher.c.mar...@gmail.com>
wrote:

Tross

unread,

Nov 26, 2008, 1:10:32 PM11/26/08

to Cloud Computing Interoperability Forum

Hey Paul,
I just saw your post after I hit send to mine. It's been a while. I
guess I'm not surprised to see some interest in SML/CML - are you
involved in either of those efforts? Well it seems we both think
alike on this topic ;-).

Andrew Trossman (IBM Tivoli - we met when you first joined eBay)

> email - Paul.Str...@gmail.com

Ross Andrus

unread,

Nov 26, 2008, 1:15:00 PM11/26/08

to cloud...@googlegroups.com

I agree with Chris on this - more context is needed before we can really know whether XMPP is sufficient or whether we'll ultimately need something beefier, transactional, more secure, etc. etc.

However, conveying that context is hard without detailing what the objects and semantics will be... the communication of which will require that some representation be agreed. Chicken? Egg?

To avoid endless recursion, I suggest we start with something extremely simple - a pidgin of xml schema would be fine - and use that to flesh out the proposal. And make "choosing the right representation" an explicit design choice driven, like other choices, by our collective goals around adoption, performance, security, extensibility, etc.

I personally think leveraging existing work will be an important goal - I'm at the moment looking at a bunch of XSDs from the DMTF - which makes that a contender, in my mind at least.

Finally, apologies for points made moot by the last kaffeklatsch, which I unfortunately missed.

My $0.02

ross

Matthew Zito

unread,

Nov 26, 2008, 1:51:55 PM11/26/08

to cloud...@googlegroups.com

So, it seems like there’s some general consensus that it makes sense to KISS and spec before we go nuts deciding protocol specifics. Where do we go from here? What do people see as the general areas for investigating first?

Thanks,

Matt

Reuven Cohen

unread,

Nov 26, 2008, 2:25:24 PM11/26/08

to cloud...@googlegroups.com

Thanks for all the great insights. I also agree the the last thing we need are standards. If we do this right the standards will organically emerge over time. My goals for a Unified Cloud Interface (UCI) are fairly simple, although my ambitions are much larger.

The mission is this: Cloud interoperability for the purposes of reducing cross cloud complexity.

I completely agree with Paul and others, let's not re-invent the wheel, boil the ocean, (insert your own metaphor) . Whether it's OWL, RDF, SNMP or whatever. We have a significant amount of material to use as the basis for what we're trying to accomplish.

We must focus on the core aspects of simplicity, extensibility and scalability / decentralization in looking at this opportunity.

In regards to whether or not XMPP is powerful enough would at this point seem somewhat secondary. I'd use TCP as an analogy for our dilemma. TCP is arguable not the most scalable, secure or efficient protocol. But in it's simplicity was its ultimate advantage. The Internet works because it can fail dramatically without affecting the Internet at large, this is because of a decentralized fault tolerant architecture. Architecture that assumes failure. There are numerous messaging platforms and protocols to choose from, but none of which seem to address decentralization and extensibility to the extent that XMPP does. Is XMPP perfect? Probably not, but for our purposes it's more then adequate.

I envision a communication protocol that takes into consideration a future that may be vastly different then today's Internet landscape. In someways my ambitions for UCI is to enable a global computing environment that was never previously possible. A technology landscape where everything and anything is web enabled.

Yes, I have big ambitions, it is not often we find ourselves in the midst of a true paradigm shift. This is our opportunity to lose.

reuven

Paul Strong

unread,

Nov 26, 2008, 2:25:55 PM11/26/08

to cloud...@googlegroups.com

IMHO I would step back from the schema discussion. "These are not the
schemas that you are looking for" (waves hand) as Obi-Wan would say.
In my experience the temptation to go straight to code/xml invariably
results in skimping on requirements, resulting in either something
that does way too much. but none of it adequately to solve the real
problem, or solving the wrong problem. We are all super busy, and it
would be sad to see a lot of effort by so many smart people being
either wasted or ineffective.

Rueven, others, what do you see as the interoperability problems that
are unique to clouds? Are these user of provider problems? Let's bat
this around a bit before embarking on a vocabulary/schema and protocol
discussion.

Paul

--
Paul Strong
email - Paul....@gmail.com

Jason N. Meiers

unread,

Nov 26, 2008, 2:24:48 PM11/26/08

to cloud...@googlegroups.com

I have been working with SEMP (simple event management protocol ) here are some protocol specifics. This has helped tremendously for event management and autonomics across multiple data centers/clouds. are there any items missing that you could think of? or maybe even dropping a few items to make it even simpler?

    "description", => error codes
    "group", => network, application, systems
    "host", => ip/hostname
    "location", => cloud/subnet/datacenter
    "severity", => 1,2,3,4,5
    "source", => esm system, application, ...
    "subsource", => transactions
    "time", => unixtimestamp
    "status", => open/closed/archived/pending

Paul Strong

unread,

Nov 26, 2008, 2:34:19 PM11/26/08

to cloud...@googlegroups.com

Vision and goals are good, they provide direction and hopefully they energize.

I think that with the community you moderate you/we have a great
opportunity to ask the questions and get detail on what people care
about now. Timing is everything and we need to know what people think
their current pain points are and what their next ones will be. There
are numerous Cloud events taking place and I'm sure that between us we
can cover most (I am at the IGT Cloud Summit next week and at the OGF
Europe Cloudscape event in January) and perhaps ask the right
questions.

Paul

Jason N. Meiers

unread,

Nov 26, 2008, 2:46:15 PM11/26/08

to cloud...@googlegroups.com

Paul, I think your points are valid regarding gathering requirements although from experience for IT systems management across multiple data centers or clouds I think kicking around a few ideas for actual implementation would help. To get a sense of what level we are at and next steps, nothing is set in stone its just creating opportunities for next steps. Identifying know issues and solving those larger problem's early on may help more this forward.

Just my 2 cents.

Paul Strong

unread,

Nov 26, 2008, 2:50:41 PM11/26/08

to cloud...@googlegroups.com

It's all good, as long as we don't leap headlong into spec generation
or implementation :o) I've just seen it happen too often elsewhere.

Paul

Tross

unread,

Nov 26, 2008, 2:53:57 PM11/26/08

to Cloud Computing Interoperability Forum

Ruv,
I thought you were going to stop using the word "paradigm" ;-)

Maybe I took an extra stupid pill this morning, but I'm still far from
clear on what you've got in your mind. I can tell you have a certain
clarity in your idea - I'm just not getting it.

I read the goal of cloud interoperability and a global computing
environment. I dig this, but frankly it's still too vague. For
example, if we start really small and simple with an inter-operable
cloud storage model a la S3. We've all been hearing the challenges of
data being locked into a vendor so how do we break that? What if we
could easily replicate all storage accesses across multiple vendors?
That could be pretty cool, valuable and challenging.

Perhaps I'm just in a bottoms-up mood today, but I need to start
somewhere. I can dream about ubiquitous computing across devices
ranging from datacenter servers, to home PCs, to mobile devices and
even smart dust too, but we need to start with something tangible.
Smart dust is wicked, but in this economy, who will take the risk?

Bert Armijo

unread,

Nov 26, 2008, 3:10:37 PM11/26/08

to cloud...@googlegroups.com

Couldn't agree more!

Bert Armijo

unread,

Nov 26, 2008, 3:10:36 PM11/26/08

to cloud...@googlegroups.com

Users at Cloudcamps have pretty clearly outlined a great deal of what they want to see standardized in the cloud. A common control interface. Code portability. Data portability. Common security. Inter-cloud communications.

If I read your objective correctly, you're focused solely on the first of these - the control interface. However, I don't think it's possible to create a really well thought out extensible control interface that'll stick without first identifying what it is we're controlling. Are you sticking to just a virtual machine and leaving security, networking and storage to code within the VM? If so, then you'll get portability of the mgmt interface, but not of the actual applications which is what customers actually care about.

IMHO lasting standards usually start with a model, in our case for an application running in the cloud. Internally we use what I call the Facebook test, ie could I run Facebook on this. A proper model will allow us to identify the boundaries that need standardization and the requirements at each of those boundaries. As others have noted, many attempts have been made at this, so there may already be work that can be leveraged.

Then again, I'm just a marketing dweeb . . .

From: cloud...@googlegroups.com [mailto:cloud...@googlegroups.com] On Behalf Of Reuven Cohen

Sent: Wednesday, November 26, 2008 11:25 AM
To: cloud...@googlegroups.com

Matthew Zito

unread,

Nov 26, 2008, 3:31:30 PM11/26/08

to cloud...@googlegroups.com

This is very good, though – there’s a lot of analogies here to other attempts to create open standards for vendor-specific platforms:

- resource abstraction – “I should be able to refer to an image as an image, regardless if it’s an AMI, or a slicehost VM, etc”

- reporting/information abstraction – “Regardless of the cloud provider, I should be able to report on my utilization as a common language, to the extent of apples-to-apples comparisons”

- Control abstraction – “I should be able to have a common set of verbs programmatically to interact with like-for-like systems (i.e. gogrid and aws)”

The next level above that I see as driving inter-cloud events without an intermediary or more advanced functionality within a single cloud– this is trickier both from a technical perspective and from a policy perspective (is company A that interested in making it easy to migrate data from them to company B because company B is 15% cheaper?). At that level, you have:

- AAA abstraction – Authorization, Authentication, Accounting – pushing security policies between two different cloud providers

- Data movement – Migration/cloning/duplication/distribution of information across providers

- Policy-based administration – “Based on performance criteria X, perform activity Y” – this is likely to be very very difficult, as these mechanisms tend to be very platform-specific, so I lumped it in the more advanced section

There’s another level above that which is really pie-in-the-sky type things, complete abstraction, dynamic reallocation based on costing, integration to external systems, etc. etc.

But the first set I listed above has been tackled by many people, with varying degrees of success. What other high-level classes of functionality are missing from this?

Thanks,

Matt

Reuven Cohen

unread,

Nov 27, 2008, 12:39:31 PM11/27/08

to cloud...@googlegroups.com

@Paul
I agree, let's focus on what we'd like to accomplish before we get into the specifics of technology or it's specification We need to define our objectives, , purpose and requirements.

@Tross
What are your thoughts on defining our mission? And yes, I will try to stop using the word "paradigm" I get carried away with my vision stuff sometimes.

@Jason
Can you give a little more details on the SEMP (simple event management protocol ), who is the ideal user, what problem does it solve, how it would be implemented?

@Bert
I few people have mentioned data portability. Do you think that data portability might be a separate challenge?

@Matthew
Matt your list is a great starting point. I'll create a group page to help fully flesh out the various requirements. I've been focusing on the aspects of control abstraction, but it's obvious there may be a much larger challenge.

Reply all

Reply to author

Forward