Where Content Strategy meets Linked Data

104 views
Skip to first unread message

Andy Mitchell

unread,
Mar 17, 2014, 5:47:39 PM3/17/14
to content...@googlegroups.com
Hi All,

I've been a silent observer of the group for a few months now and have found the comments and discussions very useful, so first my thanks to the group for that!

I suspect there are a lot of folk here who have a shared interest in content strategy and the development over the years of the semantic web but I'm struggling to find much that really brings the two worlds together...am I looking in the right places??

In particular I'd be really interested to hear peoples thoughts and experiences on the use of standard schemas (e.g. http://schema.org/) when moving to a structured content approach.  Whilst its clear that the use of XML is a dominating force in the world of content strategy, what are peoples experiences with other modelling paradigms for mapping and defining structure in their content, in particular RDF?  And on a final related note, what experience do people have with these less well known semantic CMS (http://www.webnodes.com/    http://www.ximdex.com/ ) ... are they still 'just' a web CMS or has anyone experience of using them as more of an enterprise content system?  

Great to hear any thoughts and opinions... and would be happy to expand on these points if I've been too vague!!

Thanks,
Andy.

Rahel Anne Bailie

unread,
Mar 18, 2014, 1:39:04 AM3/18/14
to content...@googlegroups.com
Funny you should choose that title for your post, as my theme for 2014 is "Where Content Meets Data" - http://www.congility.com/session/where-content-meets-data-navigating-brackish-waters/

I have lots of thoughts on this but have to leave my response for another day as I'm in back-to-back meetings today. But I'll be very curious to see where this discussion goes!


---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Co-producer: Content Strategy Workshops
Co-editor: The Language of Content Strategy - 
in stores now
Co-author: Content Strategy: Connecting the dots between business, brand, and benefits



--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to contentstrate...@googlegroups.com.
To post to this group, send email to content...@googlegroups.com.
Visit this group at http://groups.google.com/group/contentstrategy.
For more options, visit https://groups.google.com/d/optout.

Zahoor Hussain

unread,
Mar 18, 2014, 3:13:13 PM3/18/14
to content...@googlegroups.com
Hi Andy

Thanks for posting a question which made me smile. :)

For those new to the subject, Schema.org is an exercise in modelling the Universe we live in, it extends the work of micro data and generates markup to be consumed by machines (not for us humans).

My last couple of content models have been based heavily on schema.org. That is to say that schema.org is used all the way through the content lifecycle and not just for publishing content for the web. Schema.org has had hundreds of new entities added to it recently and now numbers hundreds of content types and is gaining traction in the industries that I work in.

Schema also has credibility, imho, being backed the big guns and is wiping the floor with other content vocabularies. 

Semantic content models helps disambiguate the entities and properties of the model, we can all agree on the definition of what we mean by http://schema.org/Person 

I have not used the CMS systems you have listed. In my experience the RDF metadata tends to sit outside the CMS so this would be Drupal, Alfresco etc with a triple store (or even graph store) to manage the metadata. This slide on twitter gets some of the idea across https://twitter.com/izahoor/status/401636673546485760/photo/1

The solution architectures in my experience, require a mixture of applications to solve the problem (use the best tool for the job). So this usually includes a CMS, DAM and a triple store for RDF metadata. The Linked data you refer to is usually generated by an Content API, which is also used to build the website. 

Happy to explore this area further or answer any questions that you have about what I've tried to articulate. 

Many Thanks

Z

Marcia Riefer Johnston

unread,
Mar 18, 2014, 5:20:53 PM3/18/14
to content...@googlegroups.com
Rahel, I like your theme for 2014, "Where Content Meets Data." Nice.

Andy Mitchell

unread,
Mar 20, 2014, 8:33:18 AM3/20/14
to content...@googlegroups.com, zah...@annotation.co.uk, zahoor....@annotation.co.uk
Hi Zahoor,

Thanks for the detailed response...  and I'm pleased to have prompted a smile too :)

I'd love to explore this further with someone who has hands on experience in this space... im coming at this from a background of database schema design, web IA and business analysis so still learning with RDF :)

I like the slide you posted and picked up very similar architectural ideas from talking to others and looking at EUCLID http://www.euclid-project.eu/modules/course5

I think I have a grasp of the split both you and others describe in terms of content store separate from triple store, but purely from a modelling perspective why would you use XML to define the structure within the content store since schema.org models (and bespoke models) could be equally well if not more flexibly represented in RDF?   

As I said it would be great to talk this through as I feel I'm missing something fundamental here!  Perhaps we wait and see what others have to say but if no-one else joins in we could have a call sometime?

Cheers,
Andy.

Andy Mitchell

unread,
Mar 20, 2014, 8:53:27 AM3/20/14
to content...@googlegroups.com
Hi Rahel, 

That is spooky!  I guess its gone in subliminally some how from my reading around the web!

I'm going to see if I can come along to the conference, it would be great to hear about what you have to say on this, so hopefully we'll get a chance to talk this through there!

Equally it would be great to open up this conversation more here online as I'm sure there must be others grapling with this at the moment.  Interestingly I attended a linked data meetup here in the UK recently and they made the point that 'linked data' had possibly been misnamed and should really have been called 'linked things'... using the word data can often make people think of quantitative, numeric and highly structured information where actually it is obviously about much more than that.

Whilst I am keen that we benefit from a move to structured or semi-structured content, what I am interested in is avoiding some of the pitfalls which out of the box CMS along with XML can lead to... expressed quite well in this post ... 



Could RDF extend into this space and offer more of the required flexibility whilst still providing semi-structured content, as well as 'simply' being used to manage metadata / linking concepts to allow better sharing and joined up approach to information and content management?

Anyway -  it would be really great to hear your thoughts!

Thanks,
Andy.

Rahel Anne Bailie

unread,
Mar 20, 2014, 9:55:38 AM3/20/14
to content...@googlegroups.com, zah...@annotation.co.uk, Zahoor Hussain
One of the things that Scott Abel and I were bemoaning was a dearth of conferences and other learning opportunities in this space. I don't know if it's that we're in our vertical spaces (financial people going to financial conferences, ecommerce people going to ecommerce conferences, etc) or that this is an up-and-coming area.

I know that Scott's conference this fall plans to address it a bit (Information Development World) and I imagine that enterprise content management conferences do, a bit. I'm hoping there will be something on this topic at Gilbane this year.

Rahel

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Co-producer: Content Strategy Workshops
Co-editor: The Language of Content Strategy - 
in stores now
Co-author: Content Strategy: Connecting the dots between business, brand, and benefits



Rahel Anne Bailie

unread,
Mar 20, 2014, 10:01:43 AM3/20/14
to content...@googlegroups.com
I wrote a three-piece blog post on my site that discussed some of the issues of trying to manage content as data and data as content. I see it as the next turf war, in that data management people want to own it all, and think that managing it should be pretty straightfoward. And content people see it through a very different lens, which isn't appreciated by practitioners who haven't edited content in all of its nuanced glory.

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Co-producer: Content Strategy Workshops
Co-editor: The Language of Content Strategy - 
in stores now
Co-author: Content Strategy: Connecting the dots between business, brand, and benefits



Noz Urbina

unread,
Mar 20, 2014, 11:40:59 AM3/20/14
to content...@googlegroups.com

Let me +1 the joy of this thread.

Also we'll definitely be trying to break the ice on this topic at Congility 2014 and I will be discussing it in my opening address.

I among the peer reviewers on  the Battle article and I'm glad that it resonated. It's a great piece of work and part of what made me make sure to get Jeff over to this year's event.

This is definitely the way of the future and the CMS market is *definitely* behind the curve of where modern content creation needs to be. Google and the device explosion have drawn a line in time and told the world that structured and semantically rich content is what will search engines will favour and what will be needed to manage multi channel ux (and effective content marketing). But the paradigm shift is something that the mass market has been actively avoiding until the biggest players made clear the advantages.

That is all great and what the structured / semantic content heads have been saying would be needed to push the boat out. But - now we have a situation where a market which was becoming very accustomed to highly functional and relatively commodotized (nearly 2000 web cmss in the world), low cost tools has been thrown back on the bleeding edge.

Now a large amount of options and real solution architecture are a major concern for teams that 3 years ago could have just "set up a website" in one of the usual suspect cmss.  There are no longer any easily chosen paths.  While the platforms catch up, we need to think creatively about the most practical strategy and approach for the medium term. I am confident that the tool world is going to stampede to address these issues, but it will be a while before there is really good support out there.

At present, I would say a separate triple store is a good idea. For modelling I find that schema.org is useful to have in mind (as the delivery layer is always an important part of the modelling requirements) but favour modelling from what the business is trying to do first and then cherry picking from standards bodies what is best to bring into your content life cycle.
I'd be interested in participating in a discussion as well as at this stage of the market, the more sharing we do the better for us all.

Noz

Andy Mitchell

unread,
Mar 20, 2014, 6:33:38 PM3/20/14
to content...@googlegroups.com
It feels like very much like I need to be at Congility 2014  :)

All of the comments posted so far are incredibly useful insights, thanks.  I think its clear from what you and others are saying that if we really want to take the 'right' approach to content/information management and to make the best of the less-than-mature technologies in this space we need to be prepared to role up our sleeves and invest in a little more R&D than we might be used to (since unfortunately there isnt an off the shelf or even a one-size fits all approach to this... yet... if ever)...  do I have the right end of the stick there?

To throw a little more fuel onto the conversational fire...

This relatively old paper makes the case that conceptually XML and ontology are different in that the former provides 'document structure' whereas the latter is a 'domain model'  ... but that when considering information exchange the gap between the two decreases...


And the following is a practical example where they ask the question "what is a document...if not a collection of information"...  to me, in a world where "every page is page 1" the old fashioned document structure surely becomes less relevant, what is more relevant are the relationships between more granular components of content... which are perhaps better described ontologically....



If there is anything more recent that you could point me to (or anything else you can share of your own thoughts that wouldnt be a spoiler for the conference!) that examines ontologies for describing information products and components for example I'd love to hear more !!

Andy.

Noz Urbina

unread,
Mar 21, 2014, 3:57:47 AM3/21/14
to content...@googlegroups.com
>It feels like very much like I need to be at Congility 2014  :)

You'd be welcome! Although I can't promise the whole conference will be a geek fest quite like this thread has been, it is definitely addressing a lot of semantic and future-proofing issues.

>we need to be prepared to role up our sleeves and invest in a little more R&D than we might be used to (since unfortunately there isnt an off the shelf or even a one-size fits all approach to this... yet... if ever)...  do I have the right end of the stick there?

I'd say that is a pretty good summary.  : )  Words like "Ontology" and "RDF triple" I believe are still firmly in the set of "Words that Send People Running" (WSPR or “whispers” – I just made that up…).  Semantics and structure were on the whisper list just 24 months ago, and this forum was discussing whether in two years we'd be including them in more and more conversations. You can see how that went. So I think it's safe to say that in another 24-36 months we'll have gotten over the terminology and conceptual shock to the system. I fact I think there will be off-the-shelf solutions, but we could be talking a decade or more before there is the wide-spread demand required to give you a WordPress for Ontology (or a goto plugin...).


>This relatively old paper makes the case that conceptually XML and ontology are different in that the former provides 'document structure' whereas the latter is a 'domain model'  ... but that when considering information exchange the gap between the two decreases...

Agreed. In fact the distinction is largely artificial. XML is a language to describe structures and relationships - often in documents, but not necessarily. There are few things you *couldn't* describe with it in a similar sense that you would be hard-pressed to talk about something you couldn't describe using words, or documents that could *not* be represented in HTML. The question is whether the technology you are actually talking about returns benefit from being XML-based or not. E.g. you could have ontology functions that are XML based or ones that aren’t. (WIKIpedia seems to say that you can’t do graphs in XML but I don’t yet see why, there seems to be XML descriptions specifically intended for graphs. That isn’t a pro-XML statement, I am just not clear what the concern is.

 
> relationships between more granular components of content... which are perhaps better described ontologically....

In England they say “Horses for courses”. Yes, clearly the traditional document model is becoming less central, but ontologies aren’t everything to content structures either. My experience is that every technology that is great suffers from a sort of be-all-and-end-all syndrome. On here:

http://www.rdfabout.com/intro/#RDF/XML

There are some paragraphs like this:

“For comparison, XML itself is not very much concerned with meaning. XML nodes don't need to be associated with particular concepts, and the XML standard doesn't indicate how to derive a fact from a document. For instance, if you were presented with a few XML documents whose root nodes were in a foreign language you don't understand, you couldn't do anything useful with the documents but display them. RDF documents with nodes you can't understand could still actually be usefully processed because RDF specifies some basic level of meaning. Now, this isn't to say that you couldn't develop your own standard on top of XML that says how to derive the set of facts in an XML document, but you'll find you've probably just reinvented something like RDF.”

Which if you aren’t familiar with both technologies reads like it implies (maybe it is implying) that these two technologies are somehow alternatives to each other when describing meaning in content. There are some bits earlier which seem to imply that because XML is hierarchical that it can’t really do tables, which is vastly inaccurate. I think the phrase “XML itself is not very much concerned with meaning” would trigger a belly laugh or a “pfft!” from most people who actually use XML in serious projects.

Again, just my feeling here, but I think we have some bad habits when it comes to talking about new technologies that are more cultural legacy than best practice. A) When introducing Tech A to a new audience, we make unintended implications about Tech B, which is also new to the audience, by making insider comments or personal gripes which are easily misunderstood B) We quite intentionally try to salt the earth and make sure that our new tech of choice comes out on top at the detriment of others, even when this is not the best outcome for the newbie.

Anyway – long story short, your broad generalisation that XML is generally focused on module structures and RDF is about relating modules is about right, but there is overlap, and both are vitally important. Be careful not to substitute one aging concept (documents) for another aging concept (pages). Modules are not pages, and modern content modelling can’t afford to think in pages.

Rick Yagodich

unread,
Mar 21, 2014, 6:34:39 AM3/21/14
to content...@googlegroups.com
Damn… What Noz said. (I might not go so far in agreeing on the size of
gap between XML and RDF, but the general approach of inflammatory
rubbishing of the other tech to make one's own win is clearly demonstrated.)

But I love "whispers"… It needs to become a common term.

Michael Atherton

unread,
Mar 22, 2014, 6:08:53 PM3/22/14
to content...@googlegroups.com
Loving this thread! Especially as I'm busy at work preparing a gentle intro to Linked Open Data, RDF, and SPARQL for next week's IA Summit conference.

http://2014.iasummit.org/web-scale-ia-using-linked-open-data/

I've lobbied for a while for a content modelled world (see http://www.slideshare.net/reduxd/beyond-the-polar-bear - already 3 years old!) and have been encouraged to see the CS community embrace the abstraction of subject domains.

However, I still see a lot of content modelling rhetoric focused on document-centric semantic markup; identifying the containers - article, gallery, blog and the like. While this has value, it's far from the whole story. For me it's all about 'things, not strings'. I'm much more excited by the TimBL 'web of data ideal; publishing out machine-readable linked data and treating documents as mere containers for the real-world entities (people, places, concepts) they contain.

Noz is right (for the moment) - ontologies and triples are enough to send a collective shiver down the spine of the content strategy community (and indeed, the arguably more mature IA community). But it can't be allowed to continue, and CS has to get comfortable with the vocab and grammar of the web itself . I admit, I'm a zealot - but in our cross-channel, device-agnostic, budget-squeezed world, we need to separate content from presentation, things from documents, and think at web-scale. What value are we publishing? Not just to our immediate customers, but to the web as a whole? How do we stitch our offering into the fabric of the web itself? How do we fill in the gaps identified in our content model by leveraging the third-party data we can't afford to create for ourselves? For me, these questions shape the future of our practice and force us to think beyond the walls of each ivory tower.

Noz Urbina

unread,
Mar 24, 2014, 12:48:37 PM3/24/14
to content...@googlegroups.com

Definitely a great thread! Warning - big long post below.

Mike, I would say that I agree it's mostly doc modelling out there, but I think that is simply because it's an older, broader (for now) field that is still vividly relevant today. I think that alone gives it the leg up in terms of critical mass of materials and mind share and just a matter of time before the balance shifts.

I want to make clear I support a continued, but non-exclusive focus on modelling the modules as well as the semantics systems around them. My concern is that I see an unfortunate potential path of least resistance.

Consider how easy it would be to say: "Today we have blobby content and we let an IA specialist arrange it in a nav structure / taxonomy. Tomorrow we will do the *same thing* but the IA will be doing their bit using this stuff called RDF, and we've been told our templates have some new mark-up under the button that says 'insert product'."

That's no good.

I am trying to use module-level semantics as a bridge to help more people get actively engaged and wrap their heads around the issues and concepts involved. Semantics and linked data don't have to be whispers... ; ) Without and active vibrant bridge between the organisation and interconnection of modules and with the on-screen experience for both users and authors, it is a step removed from the content specialist's job (unless they are themselves the IA). I think it is very easy to treat anything new as this or that specialist's problem and not your own, if people don't make it relevant for you and connect it to your world.

So I am agreed the representation of content to the end user (aka docs or pages) is no longer the only concern. We now have this other really important thing to think called RDF to think about *too*.

"I'm much more excited by the TimBL 'web of data ideal; publishing out machine-readable linked data and treating documents as mere containers for the real-world entities (people, places, concepts) they contain."

But documents are not only containers for real things. They are very often added value on top of things that interconnect things and ideas in ways that don't have a physical manifestation. Also we're seeing so called "long form" content return in significance, which means there is a demand for non-data-centric content. That doesn't mean I'm not down with the internet of things and pushing out machine-readable data. I have been at the face of the wall trying to bash down the separation between the physical and digital worlds since I saw my first augmented reality demo.

That is without a doubt where all this is heading, and Tim BL has just put a various big and famous signature at the bottom of a thought many have been kicking around for ages: the internet of the future will have no walls and no boundaries. We will not "search", we will just look, and knowledge will be presented to us. The separation of "doing" with "learning about doing" will fade and fade. Machines will not need to wait for us to formulate a question, they will anticipate. Restaurants offered up because your wearable network thinks you'll probably be hungry and remind you it's been a while since you ate Thai food, and there's a place where happy hour is going on right now... Your dishwashers will ping you on Facebook when they need more anti-spot liquid in its tray... Walking into a museum and looking at a painting will provide you information about the work, the building it's in, the exhibition it's part of, the movement in history...

Linked Data will enable a world where physical acts will replace the queries - and many other such Morpheus / Yoda type sentiments. I'm with ya, Dude. I'm totally there. But all that doesn't mean documents are "mere" anything. Documents are very useful things that add real value to the digital/physical reality ecosystem. Documents contain the world of ideas, thoughts and opinions. These may create physical manifestations but they themselves must be born and live in a non-physical space.

No, I'm not stoned.

PS - I must say I find it awesome that now that you've chimed in, Mike, more than half of the participants in this discussion have been Congility speakers either last year, are this year, or both. It looks like a conspiracy... : )

Noz Urbina

unread,
Mar 24, 2014, 1:42:05 PM3/24/14
to content...@googlegroups.com
Just finished "Beyond the Polar Bear", Mike. http://www.slideshare.net/reduxd/beyond-the-polar-bear

I really enjoyed it!
--
Noz

Content Strategist, urbinaconsulting.com
Join me at Congility 2014, Jun 18-20: congility.com/2014
Co-Author of "Content Strategy: Connecting the dots between business, brand and benefits". In stores now.

Andy Mitchell

unread,
Mar 24, 2014, 7:54:09 PM3/24/14
to content...@googlegroups.com
This is all providing a great deal of food for thought (and sorry for geeking out in your forum but I hope its as helpful for others as it is for me :) )

From my perspective (which is perhaps not as rounded or broad in the CS or linked data worlds) I'm inclined to agree that a 'document' is in itself an important construct as it provides a way of making sense of all of this related information for both the author and the consumer (in a sort of skeuomorphic and therefore familiar and comforting way) .

If the author feels that they need to direct the consumer because the granular content only makes sense if consumed together as a package / in a particular order, then the document perhaps serves as a way to provide a 'guided tour' (should the consumer want their hand held,  or for legal reasons the author needs to validate/disclaim across a broad set of related content).  But the consumer will ultimately decide what they consume of that set of content and the order in which they consume it, much like the downloaded track vs the concept album in the music world.

Anyway, so trying to summarise things so far ...

- Although an 'ageing concept' , documents continue to be valuable, but the information/concepts they contain need to be treated/managed separately in line with the web of data ideals
- XML and RDF both have their place.  Some people believe that XML can model graphs, and some believe RDF could model documents, but both are capable of handling both meaning as well as structure.
- Linked data technologies / content management tools are not yet fully 'mature' - we are currently more in a 'build' rather than 'buy' period with the tech
- All 'Content' is not 'data' - the same approaches that apply to data management dont all translate to content management i.e. you cant simply chunk up and lump all of your content into a store then query it in different ways to get different answers...it needs more complex logic/curation currently only manageable by a human


Or to put it another way... because of the maturity/adoption of XML, if I want to implement a tried and tested system relatively 'easily' for managing content then XML and documents is the way to go and I can markup containers with linked data concepts when the time is right, but if I want to fully embrace the new ideals and take a few more risks by moving away from tried and tested approaches I could probably achieve the same thing that XML based approaches have achieved but by modelling my content and classification entirely using RDF right? 

Don R Day

unread,
Mar 24, 2014, 10:30:58 PM3/24/14
to content...@googlegroups.com
On 3/24/2014 6:54 PM, Andy Mitchell wrote:
but if I want to fully embrace the new ideals and take a few more risks by moving away from tried and tested approaches I could probably achieve the same thing that XML based approaches have achieved but by modelling my content and classification entirely using RDF right? 
It depends on what you and your customers need of the content. Improved classification should help with discovery regardless of which format your content is in.  Both XML and "tried and tested approaches" can benefit from semantic and taxonomic tagging, and either path may start with enumerated data sets or folksonomies and progress through corporate schemas or RDF-based classification, or some combination of any of these. Having a loose coupling between structure and classification is a best practice in most cases, allowing you to modify terms or query strategies as products adapt to the marketplace.

But business requirements for content tend to change over time, so the end game is to enable your content to transition to those emerging needs. Blobs and even fielded data (chunks) start showing brittleness at this point.

As you mention, some renditions of documents are aging quite fast (WinHelp, anyone?), but content itself can be adaptable for new uses well into the future if it is sourced in well-designed document structures that can be transformed or queried into new uses and formats. Document structures provide content relationships and scope that can respond adaptively to nuanced queries. By applying decoupled classification schemes, you can reuse that content in new contexts without revising the content itself for every new use case. With true internal structure (rather than faux markup like shorttags), your content will have hooks that processing tools can use for generating display-dependent markup (i.e., navigation or UI behavior) that normally complicates porting content to some new theme or system.

In this viewpoint, then, blobs are still more useful with add-on classification than without, but the journey toward Content Nirvana should include getting to where that content can serve new and more challenging publishing roles, and XML may be a way to get to that future platform. Along the way, RDF and other classification systems are handling the user's understanding of what you have to offer, hopefully making the journey better for all.

--
  • "Where is the wisdom we have lost in knowledge?
  • Where is the knowledge we have lost in information?"
  • --T.S. Eliot

Joe Pairman

unread,
Mar 26, 2014, 6:49:28 PM3/26/14
to content...@googlegroups.com
Great conversation! Noz, thanks for the "Beyond the Polar Bear" link. Very good indeed. 

On the topic of Schema.org, it may be worth mentioning (as not everyone realizes) that it works perfectly well with RDFa Lite syntax, and that probably is the more future-proof way to go, all things considered. Good arguments about that from Manu Sporny here:
And here's some info on the Schema.org site itself (more up-to-date than some other pages on the site which don't reflect the fact that RDFa now works fine with it):

Joe


Noz Urbina

unread,
Mar 27, 2014, 2:48:08 AM3/27/14
to content...@googlegroups.com

Thanks Joe! 

Has a anyone got a good article or pres that enumerates arguments for separating the triple store from the content?

Noz

Reply all
Reply to author
Forward
0 new messages