Thoughts on pipelines and xprocs

5 views

Skip to first unread message

Kurt Cagle

unread,

Oct 18, 2009, 4:15:09 PM10/18/09

to xor...@googlegroups.com

I've been thinking a lot lately about pipelines, which are increasingly what I see as the key to making Xorion work.

It's important to understand that there are essentially four vectors to performing some kind of processing in this kind of architecture:

1) A Restful Services call (an operation involving the core four HTTP verbs of GET, PUT, POST, or DELETE) is made to a Xorion "service" for purposes of retrieving or updating collections (real or virtual) - i.e., something like http://myserver.com/xorion/blogs//myLatestBlog.page.html GET (which I'll simplify here on out by leaving out the http://myserver.com part).

This in turn will be a request sent to a controller for handling code rewriting, usually also bundling key state information (such as query strings, HTTP method, user, a pointer to the uploaded body and so forth). This first controller is VERY lightweight, and is heavily system dependent. This in turn is sent to a second dispatch.xq, one which will usually be "in" the database rather than in the surrounding environment. The role of the second dispatcher is to do the heavy lifting for Xorion, by parsing out the request query URL, and from that determining the relevant map to a given processor, typically based upon configuration resources (what I'm leaning towards calling service maps) that are defined for each supported data tree that determines options (and caching) based upon HTTP method, collection, keys, queries and pages.

The leaves of these trees are where things get interesting - they could be either Xquery scripts or pipelines, and I'll address this issue in much greater detail further down.

2) A REST Xquery or XProc call - I'm differentiating here between a REST call and a RESTful services call, and to make this distinction even more clear, I'll call the former a REST/RPC call. This is essentially calling an XQuery or XProc invocation directly from a URL. This is no different from performing the same function in PHP, ASP.NET or Ruby, except in this case its XQuery that's the processing language, living within a containing Servlet. While both are necessary, a REST/RPC call in this context should be seen as privileged calls in a framework environment, one that should only be invoked when there is no alternative within a resource-oriented setting (such as integrating with outside services).

Now, I think the question needs to be raised about why I think that #1, which is obviously more complex, should be used in place of #2. Again, the key answer here is pipelines. Every time you add a module, that module is a resource that you're adding to either a generalized module collection or a resource module collection. The installation of the module (the registration of new resource types and namespaces, the creation of relevant folders/collections, tying these resources into the caching system and so forth) is accomplished by posting the module to a module collection, which in turn launches a pipeline that actually invokes this process. From the standpoint of the user (or superuser in this case, as module installation would of course need to be much more carefully controlled) this is a simple enough POST operation, likely of a ZIP file, the result of which would likely be a redirect to the module listing pages showing the new module in place. It is THIS model that we should try scrupulously to maintain - that everything (or nearly everything) having to do with Xorion is the result of a CRUD operation. There's some added complexity for the core developers, but this will come at the cost of considerably simpler interfaces for the end-user (or even the third-party module developer).

3) Scheduled services - Most XML Databases are now moving to a CRON-like model for handling specific scheduled services for everything from updating internal feeds to sending batched emails and similar messages. Xorion would of course have hooks into this model. These services would likely invoke services directly via REST/RPC rather than working on individual resource collections (though the mechanism for invoking both should exist), and realistically, both XQuery and XProc hooks should be invokable from this vector.

4) Processing hooks (actions?) - every time a non-idempotent operation (and perhaps every time ANY operation) is performed on a resource collection, there will be hooks in place for adding into the overall processing pipeline (indeed, this hook model is essential to the way that the pipeline architecture works, and hook management should be seen as a core function for the system. These hooks thus imply that any terminal pipe in a sub-processing pipeline must expose the same envelope signature (non-terminal pipes are under no such restriction, but this also implies that a mechanism must exist for differentiating between terminal and non-terminal defined steps). Note that such hooks include the possibility of external hooks - web hooks - which may be invoked every time a given pipeline is run and that sends notification messages to external URLs. Such web hooks should be explored as a critical core module as well, as one of the key advantages of a system like this is that it plays nicely with other services architectures.

Throughout all of this, I've been making one critical assumption, one that I think needs to be examined - the use of XProc. That a pipeline language is required I don't really think is debatable - we're dealing with complex systems where you may have dozens or even hundred of modules at work, and those operations are very much process oriented. XQuery is a good "low-level" language for actually performing specific actions, but given that pipelines will typically needs to coordinate between multiple modules that are not all necessarily created by the same people implies that XQuery is probably not good for actually encoding such pipelines at the high level, while XML, which of course can be manipulated with XQuery is ideal for expressiong not only static but dynamic pipelines.

However, the question is whether XProc is the best pipelining language in that regard. In favor of XProc, it is an emerging W3C standard, it is extensible and can be used for defining custom steps. On the debit side, it is not finalized (though is getting closer), it is not fully implemented on all XML Databases (yet), and it is a language that few people are capable of writing out of the box. My own inclination is to use it as a foundation, but to encode as many core Xorion operations as abstract steps as possible that can the be mapped or transformed to alternative pipeline languages, such as MarkLogic's CPF, but I'd be curious about what other people think about this.

The model that I think is emerging out of all of this is a final service map that looks something like this (the following illustrates a NIEM arrest report collection):

<services-map>
     <site id="niem-wa-state">
    <resource base-collection="niem-arrest-reports" path="/db/niem/lexs/arrest-reports">
           <collection>
                 <method match="GET">
                     <representation match="feed.xml" mime-type="text/xml">
                         <action src="get-xml.xproc"/>
                     </representation>
                     <representation match="block.xhtml" mime-type="text/xml+xhtml">
                         <action src="get-block.xproc"/>
                     </representation>
                     <representation match="page.xhtml" mime-type="text/xml+xhtml">
                         <action src="get-page.xproc"/>
                     </representation>
                     <representation match="feed.atom.xml" mime-type="text/xml+atom">
                         <action src="get-atom.xproc"/>
                     </representation>
                 </method>
   </collection>
   <collection name="niem-arrest-reports/pending">
                <method match="GET">
                      <representation match="feed.xml" mime-type="text/xml">
                             <action src="get-xml.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                             </action>
</representation>
                      <representation match="feed.atom.xml" mime-type="text/xml">
                             <action src="get-atom.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                             </action>
</representation>
                     <representation match="edit.xml" mime-type="text/xml+xhtml">
                         <action src="get-edit.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
<c:parameter-set>
                                                 <c:param name="editor" value="xforms"/>
             </c:parameter-set>
   </p:parameters>
                        </action>
                     </representation>
                     <representation match="edit.flash.xml" mime-type="text/xml+xhtml">
                         <action src="get-edit.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
<c:parameter-set>
                                                 <c:param name="editor" value="xforms"/>
             </c:parameter-set>
   </p:parameters>
                        </action>
                     </representation>
                </method>
                 <method match="POST">
                     <representation match="feed-post.xml" mime-type="text/xml">
                         <action src="post-xml.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                         </action>
                     </representation>
                     <representation match="atom-post.xml" mime-type="text/xml">
                         <action src="post-atom.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                         </action>
                     </representation>
                     <representation match="edit-post.xml" mime-type="text/xml+xhtml">
                         <action src="post-xforms-edit.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                        </action>
                     </representation>
                     <representation match="edit.flash.post.xml" mime-type="text/xml+xhtml">
                         <action src="post-flash-edit.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                        </action>
                     </representation>
                </method>
                 <method match="PUT">
                     <representation match="feed-put.xml" mime-type="text/xml">
                         <action src="put-xml.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                         </action>
                     </representation>
                     <representation match="atom-put.xml" mime-type="text/xml">
                         <action src="put-atom.xproc">
                                   <p:parameters>
<c:parameter-set>
                                                 <c:param name="filter" value="pending"/>
             </c:parameter-set>
   </p:parameters>
                        </action>
                     </representation>
                </method>
   </collection>
      </resource>
      
</site>
      
</services-map>

In the above document, the core resource collection can only be accessed via GET operations - it won't accept PUT or POST operations. The subcollection /niem-arrest-reports/pending, on the other hand, has pipelines for handling posts and puts on the data-set, as well as an edit modality for editing the content with XForms and with a Flash editor.

Whenever any resource module is uploaded (or upgraded), the service map for the resource is required, and then combined with service maps for all of the other resources to create a master site service map which is in turn used by the dispatch script to determine the specific action for any URL.

Note that it's also possible to create for a site a custom services map that allows for customization at the site level for development purposes, or for adding capabilities. Thus,


<services-map>
     <site id="niem-wa-state">
    <resource base-collection="niem-arrest-reports" path="/db/niem/lexs/arrest-reports">
           <collection>
                 <method match="GET">
                        <representation match="refresh.xq">
                               <action src="xquery/refresh.xq"/>
                        </representation>
                 </method>
   </collection>
        </resource>
     </site>
</services-map>

All paths are relative to {@resource-base-collection}, and all pipelines are stored in {@resource-base-collection}/lib unless otherwise specified by the {@resource-code-library} attribute.

In general, most resource modules will end up with common names, such as get-xml.xproc for retrieving the XML content of a given name, and the assignment of representations similarly will be in common as well. One possible consequence of this is that there may be a core resource collection (likely abstract) - that will be called whenever a given action isn't specified for a sub-resource. Thus,

    <resource base-collection="niem-arrest-reports" path="/db/niem/lexs/arrest-reports">
           <collection>
                 <method match="GET">
                     <representation match="feed.xml" mime-type="text/xml"/>

would indicate that the generalized abstract collection feed.xml representation action should be used for processing the output. Similarly, a sub-collection may end up including a stubbed <representation> node as a way of indicating that the base collection (which has no attributes) uses its associated action (and possibly up the tree to the abstract feed.xml action. This inheritance should significantly cut down on the amount of coding necessary. (Note that pipelines might themselves also invoke higher level pipelines, but that's out of the scope of the services document).

Thoughts?

Kurt Cagle
Managing Editor
http://xmlToday.org

Reply all

Reply to author

Forward

0 new messages