At relatively small scales, it is possible to create hypertext where authors hand craft the links between various places, say via typical IDREF mechanisms. Given a "standard" document of 100 pages, one author can reasonably manage to create say 1000 links within that 100 pages. We do it as a matter of course within IBM. Across a library of 10 such books, we can manage crafting 100 links from each book to the other books, but just barely. The cost is the cost not of initially creating the links, but of maintaining them as the documents change during development. This can just be managed within one library where all the books are finished and delivered at about the same time.
The next order of magnitude up would be linking across libraries. This is impossible once the number of cross-library links goes above about 50 because of maintenance problems--there are simply not enough hours in the day or people on the job to create, maintain, and test these links. The next order of magnitude up (cross-enterprise linking) is so clearly impossible as to be not worth considering.
Thus my contention that explicit linking is impossible at large scales. Even if you apply tools at one order of magnitude to make the impossible possible (say a database system to track links and manage changes), the next order of magnitude increase will be impossible. With change management and tracking, the human element cannot be eliminated and you quickly reach the practical limit at which humans can manage change.
Note that within IBM, and specifically within the Networking Systems line of business, we are now at the cross-library linking stage. Using the BookManager product, we are creating, by hand, explicit hyperlinks between the NetView, VTAM, NCP, and related product libraries. We think we can manage a relatively small number of links at this level, but there's no way we could do it at any larger scale. We need to move up to the next order of magnitude (cross-enterprise linking), but we can't with the explicit-link-based tools we have today.
Note that when I say explicit or hand-crafted links, I mean only from the author's perspective. Exoterica's experience with the Cinemania project is instructive here. If I understand it correctly, they used declarative markup in the SGML source to enable creation of all links associatively, and then bound those links explicitly in the derived form that was actually delivered. From the author's viewpoint, all the links were implicit, but from the final application's viewpoint, they were all explicit.
The cost of hypertext is the cost of creating and maintaining the links. I don't have any numbers, but it's pretty clear from my experience with the extensive hypertext we've created in Networking Systems, that the cost of maintenance is not a linear function of the number of links, but increases dramatically (geometrically?) with the number of links, compounded by the degree to which the linked objects' schedules are not synchronized and the life span of the linked objects.
Eliot Kimber Internet: drma...@ralvm13.vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709
: The next order of magnitude up would be linking across libraries. : This is impossible once the number of cross-library links goes : above about 50 because of maintenance problems--there are simply : not enough hours in the day or people on the job to create, maintain, : and test these links. The next order of magnitude up (cross-enterprise : linking) is so clearly impossible as to be not worth considering.
A clear counter-example is the development of the World Wide Web, a global hypertext system based on SGML that has cross-enterprise (and cross-protocol) links built in as a matter of design.
Edward Vielmetti, vice president for research, Msen Inc. e...@Msen.com Msen Inc., 628 Brooks, Ann Arbor MI 48103 +1 313 998 4562 (fax: 998 4563)
Are the links many-to-many in WWW? Are they author-defined or automatically generated? I see that comp.infosystems.www just was approved, is there a FAQ file or are there any papers on it I could read? -ed costello MVS/JES Information Development * Myers Corners Lab, Poughkeepsie NY
One-to-one or many-to-one, although one-to-many or many-to-many is easy enough by using an intermediate document listing the links.
Are they author-defined or automatically generated?
Both -- since WWW is an distributed and open system, both author-defined and automatically generated links are easily and often generated.
I see that comp.infosystems.www just was approved, is there a FAQ file or are there any papers on it I could read?
Yup -- telnet to info.cern.ch, which dives you into the linemode browser, where you will be able to find out anything you want to know about WWW. If you have an X machine with a direct Internet connection, I suggest you pull down a copy of NCSA Mosaic for X at ftp.ncsa.uiuc.edu in /Mosaic. There are other browsers available and you can easily find out about them online.
Also see info.cern.ch in /pub/www/doc for PostScript copies of CERN papers and ftp.ncsa.uiuc.edu in /Mosaic/mosaic-papers for the Mosaic technical summary.
Cheers, Marc
-- Marc Andreessen Software Development Group National Center for Supercomputing Applications ma...@ncsa.uiuc.edu
I wonder if Hesiod is applicable to this question...
I worked at MIT's Project Athena for four years in the mid '80s. There were several systems developed to support and manage a very large scale distributed, heterogeneous network environment. The first that came out of Athena was the X Window System, followed by the Kerberos authentication system. The third major system developed was Hesiod, a generalized network name server.
The idea was to make it possible to find things on the network without knowing their exact, real name. For example, a user could send output to -Pps (a Postscript printer). The lpr command would actually get hold of "ps" and, using standard Hesiod library calls, ask the name server what "ps" really translated to in that user's context. Hesiod would go through a series of name resolution passes, eventually returning the fact the "ps" really mean "lw2" connected to machine "prill" in the "ringworld" cluster, and that's where lpr would actually send the output.
Hesiod supported names in defined domains (rather like internet addressing). Moreover, it supported arbitrary depths of indirection. This was nice because you could create any structured domain you wanted, and overlay it upon the real "flat" world namespace. Multiple domains, defined by different people for different purposes, could overlay the same real world. It was very simple, but very powerful--just give it a symbolic string and interpret, interpret, interpret, you got back the real string.
I've been out of the operating system Biz for six years. I never found out what happened to Hesiod; whether, like so much of the Athena software, it made it into OSF, perhaps under a different name. It is certainly true that this next generation of Unix operating systems will likely have some system utility very like Hesiod that occupies this same ecological nitch.
In article <19930504.081117....@almaden.ibm.com>, drma...@vnet.IBM.COM wrote:
>> At relatively small scales, it is possible to create hypertext >> where authors hand craft the links between various places, say >> via typical IDREF mechanisms.
This is true only for the case of existing, crude hypertext tools where the links are statically constructed and manually maintained.
>> The next order of magnitude up (cross-enterprise >> linking) is so clearly impossible as to be not worth considering.
This is why IBM's stock dropped 50% last year. The inability to break out of existing paradigms of knowledge representation and utilization has always been one of their fortes. (no flames please, it's meant to be tongue in cheek). Seriously though, the "Unix" mindset imposes a stream-oriented text view of representing documents. The current electronic format of information is not very different to existing paper representations.
>> Thus my contention that explicit linking is impossible at large >> scales.
This is not the case at all. The solution is very simple. Stop representing text documents as electronic versions of their paper counterparts. Rather than simply represent electronic text as a stream of characters with embedded information, structure documents in a new form. If electronic documents existed as a set of related objects in an information (object) base (e.g., one thought, concept, or paragraph per object), links (relations) can be established between individual objects, classes of objects, collections of similar objects, sub-elements of objects, etc.
This is a more natural representation of knowledge. I don't know many humans who organize their thoughts as large, linear chunks of text with the occasional embedded static link to some other information. Most people remember small pieces of info that are combined together to provide "bigger pictures" and can be dynamically reorganized without losing their meaning.
As documents in this new format are reorganized, rewritten, etc., the links become self-maintaining. A link that related 2 concepts in separate "documents" can continue to relate them, regardless of their position within their respective documents, changes in their content, etc. In fact, deleting the source or destination of a hyperlink becomes self maintaining as well, since you simply remove the relation in this point. Links are actually separate objects unto themselves and are not properly part of the information at either end of the link. Links between individual words or phrases become easy to automate. A program can simply search related chunks of information and infer which portions are explicitly related based on the nature of the relation between the chunks.
In short, the concept of a hypertext document as it exists today (e.g., HTML) is doomed to failure in the long run because of the problems of scale you mention. There needs to be a better way to represent information in a computer that is vastly different from the way we represent it on the printed page. SGML (HTML) ain't it.
My major objection to existing Hypertext systems is that the data is "dumb" and the burden of intelligence is placed on the humans and software that is part of the process. An object oriented approach (and I mean in the general sense of data objects, collections, etc., not the overworked academic interpretation of pure object orientation) allows the data to be smart. The implicit structure of the data allows relations to be easily generated and maintained. The data must be smart and self-maintaining, not the people that create it, nor the programs that operate on it.
The current problem is that data is dumb as a post and has no clue when it is inconsistent, out of date, etc. By decomposing information into atomic units that can have relations made in a general or specific fashion, you avoid all of the existing hassles of trying to keep even two "hypertext" documents in sync. In fact, the entire concept of a document as we know it breaks down. A document becomes whatever collection of information a user chooses to group together in whatever structure. The hyper links are still there between "document" chunks and can be traversed, but the context of the information at the end of the link is dynamic.
Of course, all of this probably makes no sense at all. But, my point was that explicit linking is most certainly possible, is easily maintainable, and can be implemented with a minimum of effort. You just have to be willing to represent your electronic information in some form other than a dumb ASCII text file.
----------------------------------------------------------------------- Chuck Shotton cshot...@oac.hsc.uth.tmc.edu "I am NOT here."
In <1sbf6v$...@nda.NDA.COM> Linda B. Merims writes: ...
>I wonder if Hesiod is applicable to this question... >Hesiod supported names in defined domains (rather like internet addressing). >Moreover, it supported arbitrary depths of indirection. This >was nice because you could create any structured domain you >wanted, and overlay it upon the real "flat" world namespace. Multiple >domains, defined by different people for different purposes, >could overlay the same real world. It was very simple, but >very powerful--just give it a symbolic string and interpret, interpret, >interpret, you got back the real string.
This is precisely the addressing mechanism defined by HyTime, what HyTime calls "location ladders". Location ladders can be as indirect as needed to make a given link robust. Each "rung" in the ladder can be any of the many addressing methods HyTime defines, including a query, so that, for example, the example Linda gives of querying the system to find out what real printer should be used can be modeled in HyTime, like so:
<nameloc id=ps><!-- Start of location ladder for finding real PS printer --> <nmquery><!-- NMQuery contains a query and returns an SGML ID. The query can be in any query language. Here I've used HyQ, the query language defined in the HyTime standard. --> Is_Closest( <!-- Returns closest object in node list --> Is_PS_Printer(&local-lan;)) <!-- Returns list of PS printers within the specified domain, here the address of the local LAN, however that is defined. --> </nmquery> </nameloc>
The link would be instantiated by having some other object refer to the NameLoc element by it's ID. You could consider the markup above to be the content of a printer profile on a system.
Note the query statement. In HyQ, as a rule, functions act on and return node lists. In this example, the Is_Closest() function expects a node list where the nodes are network elements and returns the one closest to the location of the object that contains the query. It gets the node list in this case from the Is_PS_Printer() function, which takes the address of the root of the query domain (the DOMROOT, in HyQ parlance) and returns a node list containing the addresses (or names) of all PS printers in the domain.
The Is_Closest() and Is_PS_Printer() functions are themselves defined using HyTime primitives elsewhere, and assuming all the queries are really against system configuration data or data returned by network servers, would be eventually addressing directly to elements and PCDATA within the configuration "documents". Note that it's not necessary that these configuration documents be SGML documents as HyTime has methods for addressing into any kind of data. For example, a query against a network server that can return lists of network objects could be modeled in HyTime as a notation location query, where in this case the notation is the API used to communicate with the network server.
Eliot Kimber Internet: drma...@ralvm13.vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709
In <cshotton-060593222...@oacpslip2.hsc.uth.tmc.edu> Chuck Shotton writes:
>In article <19930504.081117....@almaden.ibm.com>, drma...@vnet.IBM.COM >wrote: >>> At relatively small scales, it is possible to create hypertext >>> where authors hand craft the links between various places, say >>> via typical IDREF mechanisms.
>This is true only for the case of existing, crude hypertext tools where the >links are statically constructed and manually maintained.
That's what I mean by hand-crafted explicit links.
>>> The next order of magnitude up (cross-enterprise >>> linking) is so clearly impossible as to be not worth considering.
>This is why IBM's stock dropped 50% last year. The inability to break out >of existing paradigms of knowledge representation and utilization has >always been one of their fortes. (no flames please, it's meant to be tongue >in cheek).
I think you misunderstood my point. I wasn't complaining that what I wanted to do was impossible, I was pointing out that the current approach to linking is impossible and that we need to change it.
> Seriously though, the "Unix" mindset imposes a stream-oriented >text view of representing documents. The current electronic format of >information is not very different to existing paper representations.
You're both right and wrong. Certainly it is taking the writing and related tools communities with IBM time to move away from the traditational "book-oriented" models for creating and delivering information, and especially the hand-crafted nature of information in general, irrespective of how it is organized or delivered. That is changing, but it takes time. At this point, our tools have not kept pace with our understanding of how to do hypertext well and productively. I'm trying to change that.
Our current primary delivery mechanism for online information is the BookManager product. BookManager is a fine product and has many strengths, including some associative linking capability that is very useful. However, the developers of BookManager did not fully appreciate the power of truly generic markup and the ability to thereby create powerful hyperlinks automatically. Thus, they created a product that does not have sufficient generality in its hyperlink enablement to take advantage of the new languages we are developing that are truly generic. The BookMaster language really only provides explicit hyperlinking elements (cross references and explicit link phrases) and does not provide the specificity of markup needed to enable full automatic implication of hyperlinks. Thus, writers within IBM have been hampered by their tools and have, to a large degree, focused on how to work around or within the limitations of their tools, not how to define better systems that solve the problem entirely. However, some folks, like myself, are working on that problem within IBM and I think we've about got it solved. Stay tuned.
>>> Thus my contention that explicit linking is impossible at large >>> scales.
>This is not the case at all. The solution is very simple. Stop representing >text documents as electronic versions of their paper counterparts. Rather >than simply represent electronic text as a stream of characters with >embedded information, structure documents in a new form. If electronic >documents existed as a set of related objects in an information (object) >base (e.g., one thought, concept, or paragraph per object), links >(relations) can be established between individual objects, classes of >objects, collections of similar objects, sub-elements of objects, etc.
I couldn't agree more. This has been exactly my point from the start.
>In short, the concept of a hypertext document as it exists today (e.g., >HTML) is doomed to failure in the long run because of the problems of scale >you mention. There needs to be a better way to represent information in a >computer that is vastly different from the way we represent it on the >printed page. SGML (HTML) ain't it.
You are correct that hand-crafted explicit hyperlinking is doomed to failure, however, you are incorrect in saying that SGML is somehow implicated in that failure (I don't know what HTML is). Rather, SGML and HyTime enable exactly the solution to this problem you propose. The fact that SGML defines and provides explicit link mechanisms is irrelevant as SGML does not in any way *limit* you to explicit links.
Eliot Kimber Internet: drma...@ralvm13.vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709
In article <19930507.064544....@almaden.ibm.com>, drma...@vnet.IBM.COM wrote:
> >In short, the concept of a hypertext document as it exists today (e.g., > >HTML) is doomed to failure in the long run because of the problems of scale > >you mention. There needs to be a better way to represent information in a > >computer that is vastly different from the way we represent it on the > >printed page. SGML (HTML) ain't it.
> You are correct that hand-crafted explicit hyperlinking is doomed > to failure, however, you are incorrect in saying that SGML is somehow > implicated in that failure (I don't know what HTML is). Rather, > SGML and HyTime enable exactly the solution to this problem you > propose. The fact that SGML defines and provides explicit link > mechanisms is irrelevant as SGML does not in any way *limit* you > to explicit links.
I plead my ignorance of many of the internals of SGML. HTML is the "flavor" of SGML being used by projects like World Wide Web, etc. It only supports static, explicit links.
Correct me if I'm wrong, but the paradigm used by SGML is still one of a self-contained document with embedded tags which provide a mechanism for relating it to other documents. I still maintain that the implementation of relations and links should be physically disjoint from the content. The idea being that the content can be maintained independently of any of the relational information, freeing content authors from a lot of the burdens of link maintenance.
If SGML can support this paradigm, please clue me in as to the specific mechanisms. I'd love to find a ready-made answer.
--_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- Chuck Shotton | Ass't Director, Academic Computing | "This space for rent." UT Health Science Center Houston | cshot...@oac.hsc.uth.tmc.edu | _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
In <cshotton-060593222...@oacpslip2.hsc.uth.tmc.edu> Chuck Shotton writes:
>This is why IBM's stock dropped 50% last year. The inability to break out >of existing paradigms of knowledge representation and utilization has >always been one of their fortes.
<flame (request for no flame notwithstanding)> What an incredibly cheap shot. Whatever our company's problems are (which I am not inclined to discuss here), Eliot is not part of the problem, he is part of the solution. If you have been watching this space, you have seen that his proposals are pushing forward the state of the art. </flame>
>This is not the case at all. The solution is very simple. ...
I'll omit the rest of your remarks. Your solution does not solve the problem that Eliot was refering to. In fact your solution does nothing to solve the problem of manually managing these links, whether they are part of the document or not. Considering the thousands of links that are required to truely and effectively link the many documents of a large library, just creating and testing them is a herculean task. MAINTAINING them through revision cycles to insure they are accurate and still meaningful is even harder. No organization of data will make this problem go away and your 'solution' is not solution at all. The problem is not one of moving the information being linked in a hierarchical or flat organization, its the changes in content that affect the validity of the linkages. I should mention that your proposed organization for data matches, in some respects, quite well with what we are doing in defining our SGML application for our information developement community. OUR solution to the link creation and maintenence problem is to mark and maintain the information in our databases so that the system can reliably and consistantly generate automatically many of the links that will make online information useful.
>In short, the concept of a hypertext document as it exists today (e.g., >HTML) is doomed to failure in the long run because of the problems of scale >you mention. There needs to be a better way to represent information in a >computer that is vastly different from the way we represent it on the >printed page. SGML (HTML) ain't it.
>My major objection to existing Hypertext systems is that the data is "dumb" >and the burden of intelligence is placed on the humans and software that is >part of the process. An obje