[Feature] Content Versioning

3 views
Skip to first unread message

Dagan Henderson

unread,
Nov 1, 2011, 7:38:18 AM11/1/11
to Absolute vs relative
It's early here (4:15 a.m.), so if my thoughts seem odd or have
obvious problems (or easier solutions), sorry.

I was pondering the GUID issue for root-relative URLs and transferring
data between different content databases when I realized the GUID
would actually be the perfect foreign key (duh). Here's my thoughts:

1. We develop a secure content API (JSON-based?) that uses the GUID,
sans domain, to (globally) uniquely identify content (a strange
concept, I know :-P)
2. Post metadata can be used to "commit" content once it is approved
through business/organizational workflows
3. A new tool can be used to push committed content to a remote
install
4. Installs receiving "new" content that is older than its own version
can simply save the update as a draft and alert an editor or whatever

The API could be used to define access-control privileges. We can do
polling, reverse population (getting a copy of the production DB for
testing), and push published content to a backup install for failover
on high-bandwidth sites.

Probably because it's what I work with, I'm personally envisioning a
very Git-like workflow.

Thoughts?

Mike Schinkel

unread,
Nov 1, 2011, 11:26:46 AM11/1/11
to absolute-v...@googlegroups.com

Can you describe the specific use-case in a more detail? I'm not sure what exact problem you are trying to solve?

-Mike
P.S. As a side note, one of my plans has been to leverage YAML to handle all forms of content serialization including content import/export, to persist everything from the database to disk and to enable the developer to actually code configurations in YAML. One of the many benefit would be to enable version control of literally everything, and to enable updated of production databases from a delta of the dev/test/staging database, as appropriate. My goal is to work with a team to build such an infrastructure that could become a defacto-standard vs. just building for myself, which really doesn't interest me.

Dagan Henderson

unread,
Nov 1, 2011, 11:39:58 AM11/1/11
to Absolute vs relative
Consider an enterprise environment with 3 internal servers ( dev,
staging and production) and one cloud server (backup):

While working in dev, you could regularly pull content from staging
and/or production to ensure your code works with the existing content
as expected. Permissions, however, would prevent you from pushing
content from dev to either staging or production.

Content produced in staging could be viewed and approved internally,
and then pushed to the production site.

Production content can be automatically pushed to backup for failover
redundancy.

Anytime you need updated content for a dev environment, it's just an
API call away.


Isn't this similar to Marcus' use case for root-relative URLs? And if
we're talking about solving part of the problem, shouldn't we at least
consider addressing the rest of it in a single solution? Just my two
cents.

Mike Schinkel

unread,
Nov 1, 2011, 12:15:17 PM11/1/11
to absolute-v...@googlegroups.com
On Nov 1, 2011, at 11:39 AM, Dagan Henderson wrote:
> Consider an enterprise environment with 3 internal servers ( dev,
> staging and production) and one cloud server (backup):
>
> While working in dev, you could regularly pull content from staging
> and/or production to ensure your code works with the existing content
> as expected. Permissions, however, would prevent you from pushing
> content from dev to either staging or production.
>
> Content produced in staging could be viewed and approved internally,
> and then pushed to the production site.
>
> Production content can be automatically pushed to backup for failover
> redundancy.
>
> Anytime you need updated content for a dev environment, it's just an
> API call away.

Okay, that makes perfect sense.

I'm a little (potentially) allergic to systems that require GUIDs to work. In a former life I thought GUIDs were the panacea but someone on the team argued strongly against them. I got my way but then ended up really regretting it. Since humans can't recognize a GUID it makes "eyeballing" interchange files for validity impossible. So I would caution against building a solution that uses GUIDs except in very limited cases and only in those cases where it is machine talking to machine and never a need for humans to talk to the machine (which I'm not sure exists.)

I've been thinking about solving this (type of) problem for over a decade, and I was stumped. Problem was I assumed a "boil-the-ocean" approach. But in the past 2 years I realized that problem could be solved by "divide-and-conquer"; i.e. that we write specific code because we know the patterns (posts, postmeta, taxonomy, etc.) and then allow hooks to handle anything we don't understand.

> if we're talking about solving part of the problem, shouldn't we at
> least consider addressing the rest of it in a single solution? Just my
> two cents.

I agree 100%. And full disclosure, which I shared offline with Marcus a few day ago; my company NewClarity (currently a team of 6 people) has been working to address the needs of the professional site builder for just over a year. We plan to release a layer of code called Sunrise which we'll GPL license and make available for free download with a goal of creating a defacto standard. And we plan to cultivate a development community around Sunrise just like a development community has been built around WordPress however our developers are likely all to be professionals working on business solutions and not lots of college kids that just like blogging

Sunrise will NOT be a fork of WordPress (except maybe the WP Total Cache type of fork, but only if required) and our goals will be to address the enterprise/professional site-builder use-cases. Our business model will be consulting and training at first, and eventually move to a SaaS and maybe some other revenue opportunities, but the code will always be made freely available because we want it to become a defacto-standard.

And our culture will be one that attempts to address a user's use-case needs instead of telling them they are stupid for wanting to do it that way. We will be very focused on business needs; both to meet the needs of business to use WordPress and we want to empower businesses to generates revenue from supporting Sunrise+WordPress. Yes people are making lots of money with WordPress, but as far as the WordPress Foundation/Automattic is concerned, you are on your own to make money, they really have no partner marketing programs like Acquia for Drupal, or even what RedHat has.

So my goal and expected focus over the next 3-5 years is to work with people to solve these problems so we can create a defacto standard toolset for enterprise/professional site-builder use of WordPress named Sunrise, and to create a company that supports it all so we can hire more people to help us achieve their realities and/or empower them via partner marketing programs. And we expect a closed beta to be available before end of year for the tip of the iceberg for Sunrise.

Anyway, hope that is not inconsistent with anything you are thinking or planning. Actually I hope you like the vision and want to work on it together.

-Mike
P.S. I'm 48 and have had a few successful businesses in the past; one was ranked #123 on the Inc 500 in 1999; we sold software to Visual Basic developers via a printed mail order catalog. So this concept pulls from all my former experience and leverage what I really enjoy doing; a bit of coding, a bit of evangelizing, a bit of deal making, and so on. :-)


Dagan Henderson

unread,
Nov 1, 2011, 12:21:15 PM11/1/11
to Absolute vs relative
>Since humans can't recognize a GUID it makes "eyeballing" interchange files for validity impossible.

Sorry for the lack of clarification, here. I was referring to the
$post->guid property, which is generally something like http://www.example.com/?p=45.
The '?p=45' bit could be used to identify the same content in
different databases.

Mike Schinkel

unread,
Nov 1, 2011, 12:31:29 PM11/1/11
to absolute-v...@googlegroups.com

Ah. That is also a potential issue, because they are not always 100% unique, and when moving around from dev to production you can get URLs in the RSS feed that point nowhere.

Anyway, I'm bringing up problems and not solutions. The real solutions will come from someone writing code to see what works and what does not.

BTW, have you looked at RAMP[1]? It's a commercial product but it does content staging and I think it uses those "guids." My team plans to use it for our current project, but longer term I think that type of functionality needs to be freely available so a defacto-standard can be created.

-Mike
[1] http://crowdfavorite.com/wordpress/ramp/


Marcus Pope

unread,
Nov 2, 2011, 1:35:59 AM11/2/11
to enterprise-wp
Just to piggy back the thread...

I like the idea Dagan, especially being a git user myself. We could
also take the mercurial approach to real GUID's which is to offer both
a GUID form and a Numerical index for easy human reading. The
numerical index is only specific to the local database, but it helps
when trying to review a collection of commits.

I'm curious as to what else needs to be in the push/pull api, posts,
meta data, users? not sure how far it should go. And what do you do
about schema differences between systems? Enforce version comparisons
before accepting a push or pull?

Long ago I wrote a "diff" system for doing a schema / content diff
between two databases. You could diff mysqldbstage, and mysqldbprod,
and it would first output any alter, create, delete commands to make
the schema match. Then it would output insert, update and delete
commands to make the data match. It was a nice system that inherently
allowed for backing up a database delta to a single file. And a
simple mysql command in your build script would update your local
schema to the correct version and dataset when you updated to a
specific revision. It was actually hooked into svn so every commit
generated a backup and a delta file if it wasn't just empty. That way
any checkout of any version could autogenerate the right database.
And when we pushed to production we just "applied" the sql patches in
sequence for every numerical commit index between the last push to
production and this push. There were a few issues to contend with but
nothing that couldn't be fixed programmatically.

I stopped using it when I stopped working for companies that didn't
have a dba in house who refused to let me use it :D But it was a
viable concept, and could certainly be applied here. I don't think I
have that exact script anymore but it's something I could probably
knock out if the idea was worth it.

But I do think the issue of pushing content from staging to
production, when production has continual content modifications could
be solved with this git-like approach. In the exact same way git
solves it.

Mike, I haven't seen ramp yet, but I starred your post on wp-hackers
so I'll check it out when I get some downtime.

-Marcus

On Nov 1, 11:31 am, Mike Schinkel <mikeschin...@newclarity.net> wrote:
> On Nov 1, 2011, at 12:21 PM, Dagan Henderson wrote:
>
> >> Since humans can't recognize a GUID it makes "eyeballing" interchange files for validity impossible.
>
> > Sorry for the lack of clarification, here. I was referring to the
> > $post->guid property, which is generally something likehttp://www.example.com/?p=45.

Mike Schinkel

unread,
Nov 2, 2011, 11:27:50 AM11/2/11
to absolute-v...@googlegroups.com
Hi Marcus (and Dagan),

Like you Marcus I'm consumed by a project right now, and it is at it's 11th hour so I can't devote enough time to explain my thoughts in full detail. However, I've been planning to build a system that leverages YAML for a wide range of uses from configuration options, to code versioning, to initial data loading, to content staging and to deployment of changes in the database to a production site.

I've come to believe that we need a "divide and conquer" approach for the deployment of changes to production. It seems to me we can implement code that can move data from staging to production, even including new tables by recognizing that we understand the exact structure of the WordPress database. Using this knowledge of WordPress's database we can write higher level code that understands the business rules of the database, and we can safely ignore everything else (more on the "else" in a moment.)

For example, we know that WordPress uses the posts table for menus so we could write something that creates a YAML file that represents the appropriate menu structure from staging and then import that to the production server. The import could add anything new and delete everything that staging does not have. And we can do the same for most (all?) other aspects of the site's database. In addition to the logic for the standard features in the WordPress, each plugin could participate in the process by using hooks; this is the "else." For example, if a plugin adds information into the menu system then it could also have a hook that will return data from the staging site to be included into the menu export YAML file, as well as a hook that would read that extra data from the YAML file and apply to the production site.

I think this YAML-based approach will result in a much more manageable and robust deployment solution than trying to "boil the ocean" by inspecting the SQL dump of the staging site to see how it compares to the SQL dump of the production site and then trying to make a tool that can intelligently apply the appropriate transforms.

BTW, it's literally taken me from 1995-2010 to recognize this pattern as a solution for deployment; the deployment problem for websites has been haunting me since my first IIS+ASP+VB Script+SQL Server website went online and then needed modifications. I never thought it could really be solved until I had the epiphany described above; "Don't try to do it generically but instead work on each part that you know and leave the rest to hooks and plugins." I hope you like these ideas; I expect to work on them in 2012 and would love to work with a group of people on it rather than work independently.

That said, I now need to go head's down coding for today and probably the next week...

-Mike

P.S. There is someone else that I'd like to bring into these conversations once I finish my current project (selfishly I'd like to not have him involved until I know I can devote the time to fully participate in the discussion.) He has built a similar system based on YAML to what I've describing, although his is not yet as mature as the vision I describe above and he really wants to get others to handle the technical so he can work on higher level client problems.

Dagan Henderson

unread,
Nov 2, 2011, 5:44:19 PM11/2/11
to absolute-v...@googlegroups.com
Hmm, I'll have to think about that process a bit, Marcus. My head's swimming a bit right now for other reasons, and it seems less user-friendly than I had hoped. My goal would be to keep the workflow manageable from a user standpoint. Something more like this:

We use the built-in WP GUIDs from the posts table (which is actually a URI) and drop the domain name (root-relative GUIDs, :-P ?) when talking between servers. Using either XML- or JSON-based REST calls between servers, we can push and pull content back and forth. User metadata can be used to link user IDs between systems, and plugin settings can specify a catch-all user when one doesn't exist.

A custom metabox on posts can flag content as production-ready, and a tool can be used to push/pull content between servers. API keys can specify read-write permissions, and plugin settings can set an install as dev, stating, production or backup/failover. The goal would be for CMS-savvy users to be able to push (and take down content).

Here are some suggested API methods:

Publish/Push:
--push content
--update content
--change author
--revert revision

Read/Pull:
--get all content
--get content since
--get content of type
--get content of user

Admin:
--install plugin/theme
--activate plugin/theme
--update plugin/theme (either via WP repository or Git/SVN)
--read setting
--update setting
--read version
--update version
--add/disable user
--reset user password

I think the user jugggling would require some serious thought, but I don't think it's undoable. My hope would be to make it possible for Editors to control content on multiple installs from a single staging install and for Admins to maintain multiple installs from a single dev install.

Mike Schinkel

unread,
Nov 2, 2011, 6:46:55 PM11/2/11
to absolute-v...@googlegroups.com
Hi Dagan,

I know you were replying to Marcus but I thought I'd interject: I think your vision could easily be a layer on top of my vision.  

It seems you are focused more on the UX, the RESTful API interaction and the relationship identifications, and I am focused more on the data and required action representation formats, the logic used to apply the transformations and how to enable non-standard transformations via hooks.  And I'm also trying to create YAML infrastructure that can be used for more than just content staging, to also support persistence of all things needed to initialize a system from scratch.  

The choice of YAML vs. JSON is to encourage less technical people to be comfortable with it although we could optionally use either.  

Can't wait until I have the bandwidth to work on this...

Dagan Henderson

unread,
Nov 2, 2011, 6:52:09 PM11/2/11
to absolute-v...@googlegroups.com

Sorry, Mike. You’re post was tl;dr’d. Yet. It *was* flagged to read later, though. Which I still plan to do. We’re going to have to move off of just groups to get this really thought through, though. I suggested to Marcus that we do an Assembla wiki, but there are other options, too.

Marcus Pope

unread,
Nov 10, 2011, 2:44:10 PM11/10/11
to enterprise-wp
Hey guys, I too have been swamped, along with work I'm also trying to
finish a personal family website for my wife. Just to add some
thoughts on the above, I think we could bypass yaml and json in lieu
of the xmlrpc framework that already exists in wordpress. Not that
I'm against either of those formats (I certainly prefer both over
xml,) but I am *for* using tools that exist and are well maintained by
the framework. With the other options we're left to implement our own
security layer on top and xmlrpc already does that.

I'm totally up for a different forum for these discussions because
mailing lists present a threading problem that is difficult to
overcome (especially when I lapse for a week and have to reply to
multiple topics that are intermixed between multiple posts...) But I
really hate wiki markup, just a personal grievance of mine. I
generally like markdown better, but still think it has its flaws. But
I recognize the flexibility of the wiki platform so I'll go along if
it's the only option (but don't expect any pretty looking tables from
me! :D)

I've been looking into sync options that currently exist and migrate-
webhosts and wordpress-move seems like the closest options. But I
want to see bidirectional sync and partial content migrations :D (and
a bunch of other people want it too based on forums I found.) This of
course includes uploads from the fs, data in the database and code
related to wp core, plugins & themes. I think the list from Dagan
above is a good start, but I'm already wondering what happens to nav
menus or custom data for other plugins?

Wordpress-Move has an interesting strategy for programmatically
exporting database content for every table in the system. I think
combined with this technique: http://joegornick.com/2009/12/30/mysql-created-modified-date-fields/
we could analyze the information on an individual row level, and push
or pull only the content that is updated after the last modified
date. It doesn't solve for delete operations, but it does solve for
plugins that don't maintain created and modified columns because this
solution operates on the database level. I'm sure there's a trigger
for deletes as well so we could even capture that information in
another table. gzipping the filesystem data would allow for us to
send it over the wire via xmlrpc with a little binary streaming magic.

But this is a big undertaking. And we'd have to really flush it out,
make sure it works for mu sites and get a good support framework setup
before taking on the challenge. Maybe it's too big of a project, but
it sure would be nice to have.

Anyway, I'm swamped through the rest of the week, but I'll see what I
can find for free or cheap wiki/markdown options on the web.

-Marcus

On Nov 2, 4:52 pm, Dagan Henderson <Dagan.Hender...@epyllion.com>
wrote:
> On Nov 1, 11:31 am, Mike Schinkel <mikeschin...@newclarity.net<mailto:mikeschin...@newclarity.net>> wrote:
>
> On Nov 1, 2011, at 12:21 PM, Dagan Henderson wrote:
>
> Since humans can't recognize a GUID it makes "eyeballing" interchange files for validity impossible.
>
> Sorry for the lack of clarification, here. I was referring to the
> $post->guid property, which is generally something likehttp://www.example.com/?p=45.
> The '?p=45' bit could be used to identify the same content in
> different databases.
>
> Ah.  That is also a potential issue, because they are not always 100%
> unique, and when moving around from dev to production you can get URLs in the RSS feed that point nowhere.
>
> Anyway, I'm bringing up problems and not solutions.  The real solutions will come from someone writing code to see what works and what does not.
>
> BTW, have you looked at RAMP[1]?  It's a commercial product but it does content staging and I think it uses those "guids."  My team plans to use it for our current project, but longer term I think that type of functionality needs to be freely available so a defacto-standard can be created.
>
> -Mike
> [1]http://crowdfavorite.com/wordpress/ramp/- Hide quoted text -
>
> - Show quoted text -

Mike Schinkel

unread,
Nov 10, 2011, 5:40:40 PM11/10/11
to absolute-v...@googlegroups.com
On Nov 10, 2011, at 2:44 PM, Marcus Pope wrote:
Hey guys, I too have been swamped, along with work I'm also trying to
finish a personal family website for my wife.  Just to add some
thoughts on the above, I think we could bypass yaml and json in lieu
of the xmlrpc framework that already exists in wordpress.  Not that
I'm against either of those formats (I certainly prefer both over
xml,) but I am *for* using tools that exist and are well maintained by
the framework.  With the other options we're left to implement our own
security layer on top and xmlrpc already does that.

I'll prefix by saying I've got to get a working site done by Monday, and it's the project I've been working on for 6 months, so some of the following thoughts may not be fully coherent...

Generally I am for tools that already exist, but I'm also for introducing new approaches when that existing tools cannot address.  For example, I tried to build something leveraging the WordPress menu system.  It turned out to be incredibly buggy and that literally was one month of my life that I want back.  From what I've looked at it, I believe that attempting to bend the XML-RPC framework to meet the needs I'm envisioning would be fraught with peril.

Further, XML is a (for mere mortals) a "read only" markup language (same for JSON.) OTOH, YAML is a "read-write" language.  I can envision people actually manually authoring YAML files for inclusion in a version control repository, which is one of my envisioned use-cases.  I cannot envision WordPress developers writing XML files (because they already could if they were willing to, but they never do.)

If you feel strongly about this I certainly won't try to dissuade you but our 2012 efforts will definitely include working to build a YAML solution.

I'm totally up for a different forum for these discussions because
mailing lists present a threading problem that is difficult to
overcome (especially when I lapse for a week and have to reply to
multiple topics that are intermixed between multiple posts...)  But I
really hate wiki markup, just a personal grievance of mine.  I
generally like markdown better, but still think it has its flaws.  But
I recognize the flexibility of the wiki platform so I'll go along if
it's the only option (but don't expect any pretty looking tables from
me! :D)

I personally find wikis great for docs but completely unworkable for discussions.  I don't think there yet exists a reasonable platform for this type of discussion, as we've been looking up but have come up with nothing (maybe we should build one...?  Nah, not my passion.)  Unless we have a paid SaaS system for discussion I don't think there exists better than Google Groups; email for notifications and easy discussion, the website for reviewing history.

I've been looking into sync options that currently exist and migrate-
webhosts and wordpress-move seems like the closest options.  But I
want to see bidirectional sync and partial content migrations :D  (and
a bunch of other people want it too based on forums I found.)  This of
course includes uploads from the fs, data in the database and code
related to wp core, plugins & themes.  I think the list from Dagan
above is a good start, but I'm already wondering what happens to nav
menus or custom data for other plugins?

I think all current options have a vision that is too small.  That's why I have been planning to address the problem in 100% fashion rather than partial fashion.  But it'll be a big project.

If you need something short term that's great, but I really want to put my effort towards solving the problem fully.  As an aside, I think what does not work is to work with SQL commands; it'll be too hard to get right without building a full SQL recognizer.

Wordpress-Move has an interesting strategy for programmatically
exporting database content for every table in the system.  I think
combined with this technique: http://joegornick.com/2009/12/30/mysql-created-modified-date-fields/

The use of triggers concern me since that requires adding something more than database tables and indexes to WordPress.  I'd rather see us to something like this in a query table.

But this is a big undertaking.  And we'd have to really flush it out,
make sure it works for mu sites and get a good support framework setup
before taking on the challenge.  Maybe it's too big of a project, but
it sure would be nice to have.

Funny, my approach is even bigger. :)

Anyway, I'm swamped through the rest of the week,

DITTO. :)

Hope I can come up for air late November.

-Mike
Reply all
Reply to author
Forward
0 new messages