Sacred Cows part 2

Jake Brownson

unread,

Apr 16, 2012, 1:34:55 PM4/16/12

to pdx...@googlegroups.com

Eric shared the Sacred Cows article with us at the last meeting. Being
a huge advocate for point #1 (don't store source as ASCII) I
immediately subscribed to the RSS feed. It seems he's gotten a lot of
controversial feedback on just that point and has done a second post
expanding on it. Worth the read IMHO.

http://joshondesign.com/2012/03/14/sacred-cows-2

Corey Hunt

unread,

Apr 16, 2012, 4:14:54 PM4/16/12

to pdx...@googlegroups.com

Agreed. That was worth the read. Subscribed as well.

--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.

Bart Massey

unread,

Apr 17, 2012, 3:29:38 AM4/17/12

to pdx...@googlegroups.com

Insert famous Spinoza quote here.

OK, I can't just let it go at that. But really: How many of these ideas do you think us old folks have not tried already? And I don't mean "kind of tried" here, I mean gave the full-out effort to?

For example, storing "source code" as some kind of data structure? Been there, done that. Lots. The classic example is Smalltalk, which Josh should know from his internship at PARC. Sure, we kept the text around: it's a good idea to do so for a variety of reasons. But all the tools worked on an internal representation very much like what the article describes. Ya know what? Most of the claimed benefits actually happened to some degree. The smart tools allowed cool custom refactoring, exposed visual elements of the code structure with a minimum of effort, etc.

The problems were real too, though. For example, turns out that non-ASCII representations are fragile--go figure. Change the language representation slightly, and everything else changes a lot. There's no limping along patching things by hand until the tools catch up, either. All the tools have to change with the representation: flag day. And the tools have to accept old representations, because the representations are hard to upgrade: backward compatibility hell.

Also turns out that writing simple tools against an ASCII representation, given a reasonable text-munging language with regexp support, is much easier than writing those tools against the structured representation. There tended to be one class of tools in structured systems: epic. Anything that wasn't worth months of effort to get built out just didn't happen at all. Go look at how much of C's structure started out as source-text-munging hacks (e.g. C++ name mangling)---and how many (e.g. CPP) still are. The good hacks got epic. The bad hacks got killed. This kind of incremental design is usually just better.

Another dirty secret was that the magic internal representation didn't turn out to be necessary. Modern PL implementations can lex, parse and structure tens or even hundreds of thousands of ASCII lines per second. If you want some "special" program representation for a tool, you can just go parse everything into it, do your business, and get out. Take a look at the Inform 7 compiler: Whole-program translation of syntactically wacky ASCII text every time you want to run your game. Happens so fast you don't even notice until the games get really, really, really big. Look at Sparse, and how fast it can do whole-program analysis of the Linux kernel. Etc.

So yeah, feel free to point out ways in which the PL design community is failing. Lord knows I could list some of my own. The problem is that if you don't learn your history, I may be condemned to repeat it. I hate that.

--Bart the Cranky Old Man

markus

unread,

Apr 17, 2012, 11:30:38 AM4/17/12

to pdx...@googlegroups.com

> Insert famous Spinoza quote here.

> [snip]

> --Bart the Cranky Old Man

Thank you. That was so much more balanced and informative than what I
was planning to write, I'm glad I procrastinated*.

-- Markus

* Mostly, I was dithering between two jujitsu attacks. Either 1)
agreeing and then using his arguments to say that we should ditch html
and just port the whole web to MS Word's "doc" file format, or 2)
agreeing with his core point that storing a complex structure as a
series of characters necessarily/unavoidably lost information, and then
exploring how we could save/transmit/do anything at all with programs
without such serialization. For the second I was thinking of drifting
into wide eyed Penrose-oid "collapse the wave function" analogies.

Bart's response was much better.

Jake Brownson

unread,

Apr 17, 2012, 1:49:53 PM4/17/12

to pdx...@googlegroups.com

> OK, I can't just let it go at that. But really: How many of these ideas do
> you think us old folks have not tried already? And I don't mean "kind of
> tried" here, I mean gave the full-out effort to?

I totally respect that, and it certainly gives me pause because I
certainly don't think I'm smarter than any of those people, and I'd
always love to read another paper I haven't read yet (I'm sure there
are a ton out there I haven't), but I'm sure you understand what that
sounds like to someone who has seen two functioning structured editing
systems, has enjoyed using both, but believes he sees where they've
gone wrong (and the biggest reasons aren't interesting/hard technical
ones). I'm sure folks applied that argument to a lot of the hard
problems that have eventually been solved over the years (if I pick
one I'll sound especially pompous :p). That certainly doesn't prove
that every time someone applies that argument it's invalid of course,
but it's one that I think shouldn't stop folks from trying to find new
approaches.

> For example, storing "source code" as some kind of data structure? Been
> there, done that. Lots. The classic example is Smalltalk, which Josh should
> know from his internship at PARC. Sure, we kept the text around: it's a good
> idea to do so for a variety of reasons. But all the tools worked on an
> internal representation very much like what the article describes. Ya know
> what? Most of the claimed benefits actually happened to some degree. The
> smart tools allowed cool custom refactoring, exposed visual elements of the
> code structure with a minimum of effort, etc.

Smalltalk fell quite short of the whole structured editing experience.
It was structured editing only down to the methods which is where it
really starts to matter. It may have had the structure available to
tools making for famously great refactoring, etc, but the user's
experience of editing the code was still textual. We already have that
capability in most editors by putting each method in a separate file
if we choose (that is to say the editing experience, not the
refactoring coolness).

As I understand it, Smalltalk's non-text data store formats were
Smalltalk specific. In order to see them or edit them in a reasonable
way you need an editor that is intimately familiar with smalltalk.
Following that lead we would need different sets of editors and tools
for each language/environment which is certainly not practical.

To make this work we need a standardize-able format that stores a
graph structure with strong identities that isn't tied to one specific
domain. ASCII/Unicode aren't tied to a specific domain which has
enabled us to use them to store so much great structure, but they
don't model the structure at all. It's a model that doesn't fit the
way we're using it for programming. When it didn't fit the way it was
being used to create paper documents we wrote structured editors
called word processors for our users, but we don't follow the same
design principles for ourselves.

Please correct me if I'm wrong, because I don't know this for a fact
at all, but my impression is that Smalltalk wasn't testing the idea of
doing a general structured editing environment, but was testing making
Smalltalk structured. It seems to have been relatively successful at
doing that. I think it goes to show that we should find a way to work
in a structured environment all the time instead of just with
Smalltalk.

> The problems were real too, though. For example, turns out that non-ASCII
> representations are fragile--go figure. Change the language representation
> slightly, and everything else changes a lot.

The basic idea of the model I'd like to see is that each node in the
graph has a strong identity (guid) and a multiset of edges keyed by a
strong identity. The specification would also include a limited set of
DSLs including a schema DSL, and a basic projection DSL. The schema
DSL would allow defining an interesting structure in the graph, and
the projection DSL would describe how a specific graph should be
displayed and edited. There would be general projections that could
display any graph (not just the bubble and line ones, but textish tree
spanning ones w/ indentation/expand collapse, etc) if a domain
specific projection isn't available, but it should be relatively easy
to make projections for even obscure domains widely available with
today's Internet. It's likely that more powerful schema and projection
languages would be developed, become popular, and be standardized as
well.

The graph could store any representation, the editor could display it,
and could test it against a schema and red-underline things that don't
fit. A type checking DSL could be used to add even more power to the
schema DSL (see MPS for a working example).

> There's no limping along
> patching things by hand until the tools catch up, either. All the tools have
> to change with the representation: flag day.

Very true, but most specific tools will be easier to write since we
don't have to deal w/ all the lexing and parsing and can go directly
to the AST. Version control can be much better as mentioned in the
article. This is certainly a problem, but not a compelling reason it
can't be done.

Though not entirely true. MPS for example still uses git for version
control. It also uses all the existing Java infrastructure for
executing the stuff you edit by generating Java source (though I never
have to deal w/ it as a user). It persists its data in xml and has a
way to read the git diffs to convert it to a nice UI for doing merges.

> And the tools have to accept
> old representations, because the representations are hard to upgrade:
> backward compatibility hell.

That's only true if you want backward compat. Like you said we have to
rewrite all the tools (or at least the persistence layer of them), so
folks that are working w/ text can continue to do so, and folks that
are starting new projects can use this. Backwards compat could
certainly be done where it was necessary, but I certainly don't see
backwards compat as a compelling reason it can't be done. People are
using lots of technologies today that are relatively new and weren't
backwards compatible.

It also depends a bit on exactly what you mean by backwards
compatibility. If you created a C domain for example you could still
link against old C code. If you integrated w/ a modern C++ compiler
that exposes the AST like LLVM you could even load old C code
relatively easily. The problem of course is all of the backwards
compat work is language-specific, but it can be done as
necessary/desired and even if it isn't I don't see it as a reason not
to move forward.

> Also turns out that writing simple tools against an ASCII representation,
> given a reasonable text-munging language with regexp support, is much easier
> than writing those tools against the structured representation. There tended
> to be one class of tools in structured systems: epic. Anything that wasn't
> worth months of effort to get built out just didn't happen at all. Go look
> at how much of C's structure started out as source-text-munging hacks (e.g.
> C++ name mangling)---and how many (e.g. CPP) still are. The good hacks got
> epic. The bad hacks got killed. This kind of incremental design is usually
> just better.

I'd suggest giving MPS a go. It might be as or a little more difficult
to go from an existing text thing to a structured thing (see backwards
compat above), but extending a structured language is massively
easier. At SPLASH conference this last year there were several
projects proposing very complicated solutions to the idea of extending
text languages. You need to worry about ambiguity while retaining ease
of use for the user, and good luck trying to compose language
extensions. In MPS it's super easy as it would be in any similar
structured system. MPS itself is of course an epic solution, but so
are most individual text editors worth programming in. Once you have
MPS (or something like it) each specific solution isn't epic. Maybe
I'm not understanding the point here but I haven't found this to be
true at all.

> Another dirty secret was that the magic internal representation didn't turn
> out to be necessary. Modern PL implementations can lex, parse and structure
> tens or even hundreds of thousands of ASCII lines per second. If you want
> some "special" program representation for a tool, you can just go parse
> everything into it, do your business, and get out. Take a look at the Inform
> 7 compiler: Whole-program translation of syntactically wacky ASCII text
> every time you want to run your game. Happens so fast you don't even notice
> until the games get really, really, really big. Look at Sparse, and how fast
> it can do whole-program analysis of the Linux kernel. Etc.

This is totally true. I wouldn't hate using a system that is totally
structured but stores the data back in source format. In fact many
IDEs are slowly slowly moving in that direction (but have only
scratched the surface). Even syntax highlighting is a small example of
it. I just find it silly to go to that extra work to do so. Since I'm
never (or rather wouldn't be) looking at the "insides" of the file on
disk in a hex editor or a text editor I couldn't care less what the
1's and 0's are, but I'm not going to come up w/ an unambiguous text
representation for every language I work with, make a parser for it
and a pretty printer for it when I'll never see the results of it. I
wouldn't even strongly object to storing the graph structure I
mentioned earlier in some XML like thing, I just don't think there's a
compelling argument to do it when it would likely be more compact and
faster to store it in a more optimal way. I know you never said
otherwise, but it was a realization for me when I realized that a text
file is binary file. It's not like we have a bunch of letters etched
on the disk. We (or at least I) have been trained to think of text
files as a different thing, there's a special flag when you open them,
etc, but it's just a highly standardized binary format. There's this
sense that if you _really_ want to see what's in the file you open it
in a text editor, but I could have just as easily been trained to open
it in a graph editor and use a standard projection for the data.

I'm not terribly personally interested in the argument about
efficiency of parsing, computers are fast and loading/editing in
either world is/would be sufficiently fast, but I do know that when
XCode starts its indexing run one on my moderately sized projects one
of my cores is pegged for a couple mins. Maybe that doesn't slow down
my computer noticeably but it does delay my autocomplete things (which
XCode also sucks at btw, not holding it up as a poster child, but
maybe it wouldn't be so bad if it didn't have to re-solve all the
problems of pulling structure out of text). Though I'm not familiar
with it I'm sure Sparse is a fantastically fast tool, but I bet a lot
of hard work went in to making it so for one specific language, and
it's not really useful to me since that work apparently hasn't made it
into any of the tools I use since they're not fast.

> So yeah, feel free to point out ways in which the PL design community is
> failing. Lord knows I could list some of my own. The problem is that if you
> don't learn your history, I may be condemned to repeat it. I hate that.

Hehe, well said. I'd love to continue the conversation w/ you in
person (is it worth doing at a pdxfunc meeting? it's not really very
functional-specific) and get more perspective on the history of this
stuff. To be honest I'm certainly weaker there than I should be. I'd
also love it if you could convince me that we'll still be programming
in text editors in 20-30 years and it's really the final solution so I
can stop thinking about it so much and just get back to work ;).

Jake

Jake Brownson

unread,

Apr 17, 2012, 1:59:25 PM4/17/12

to pdx...@googlegroups.com

> 1) agreeing and then using his arguments to say that we should ditch html
> and just port the whole web to MS Word's "doc" file format

I'd argue that HTML and "doc" address fundamentally different domains.
I think it'd be great to edit HTML in a structured projectional way
too, but it's actually a different domain than "doc" is and would have
a different DSL associated with it (like HTML). There's significant
overlap in the two domains, but only in the simple parts.

> 2) agreeing with his core point that storing a complex structure as a
> series of characters necessarily/unavoidably lost information, and then
> exploring how we could save/transmit/do anything at all with programs
> without such serialization. For the second I was thinking of drifting
> into wide eyed Penrose-oid "collapse the wave function" analogies.

Not sure I completely understand this point, but it's totally true
that we can represent anything we want in streams of characters, but
it creates a lot of work that wouldn't be necessary in a structured
environment, and in practice doesn't get done for anything except for
the most widely used languages for the biggest value items. For
example it's just much more difficult to compose domains in a text
environment which is probably why in practice using text-based systems
we haven't done it much and he made the complaint he did about
codebases. Doxygen is an example where we have done it because one
domain luckily fit inside the other relatively well (because in a
comment in a program you can put whatever you want) and the value of
doing so is particularly high, but even Doxygen has an awkwardness and
set of limitations imposed by the text environment.

Jake Brownson

unread,

Apr 17, 2012, 2:18:48 PM4/17/12

to pdx...@googlegroups.com

Really excited to see folks' opinions on this topic. Thought I'd add another link to the table. I think it's difficult to picture what structured editing looks like without seeing an example of it. Here's an example somebody did of a prototype of a C# structured editor. It's a super fluid editing experience (many folks seem to think it would be more awkward). I've used two structured editors extensively (MPS and ISC's IDW) and found them to be quite nice to enter information in compared to a text editor. To me it feels way less messy.

http://blogs.msdn.com/b/kirillosenkov/archive/2009/09/08/first-videos-of-the-structured-editor-prototype.aspx

This example falls short in a couple of ways imho. It's built specifically for C# (the solution for this is a language workbench ala MPS or IDW), and it's not complete (when you get down to statements it's just a text box again, language workbenches make it easy to define structured editing for the whole language), but it gives a good taste of what it's like to edit structure.

You can check out MPS for more examples. MPS tries too hard to continue to feel like text imho, but I really enjoy working in it. I love working with the nearly perfect degree of freedom it offers. I'm not constantly checking to make sure there aren't extra whitespaces at the end of a line, or getting the brackets just the way I like them, but encountering code someone else wrote and going "grr". Also just to be clear about MPS. I think the core ideas of it are really solid, but the implementation leaves a bit to be desired. The problems I have with the implementation are not the interesting/difficult problems inherent in this approach (which they mostly do well), but are more straightforward software engineering issues that have beens solved elsewhere which is why I continue to have hope. MPS is open source btw.

MPS: http://www.jetbrains.com/mps/

I gave a half-demo of MPS awhile back at the meeting (I was focusing more on what I had done with it). If anybody is interested in seeing more about it I could do a presentation focused on it. I'm not sure how functional-specific such a presentation would be though.

In the end I don't expect just anybody to be convinced without having a fantastic example of a general purpose structure editor in front of them. The fact that there isn't one is a fantastic argument and one I don't yet have a true answer for, but I think it'd be great if more folks were thinking about this and working on it.

Jake

markus

unread,

Apr 17, 2012, 3:35:40 PM4/17/12

to pdx...@googlegroups.com

On Tue, 2012-04-17 at 10:59 -0700, Jake Brownson wrote:
> > 1) agreeing and then using his arguments to say that we should ditch html
> > and just port the whole web to MS Word's "doc" file format
>
> I'd argue that HTML and "doc" address fundamentally different domains.
> I think it'd be great to edit HTML in a structured projectional way
> too, but it's actually a different domain than "doc" is and would have
> a different DSL associated with it (like HTML). There's significant
> overlap in the two domains, but only in the simple parts.

Which was sort of my point. :)

Except, of course, that I _don't_ think it would be great to edit HTML
in a "projectional" way, any more than I think it would be great to do
geometry by eyeballing diagrams, or algebra by poking at graphs. When I
shop for sheet music to learn a new piece, I don't ever think "Oh look,
I can get a projectional form on CD and work from that!"

> > 2) agreeing with his core point that storing a complex structure as a
> > series of characters necessarily/unavoidably lost information, and then
> > exploring how we could save/transmit/do anything at all with programs
> > without such serialization. For the second I was thinking of drifting
> > into wide eyed Penrose-oid "collapse the wave function" analogies.
>
> Not sure I completely understand this point, but it's totally true
> that we can represent anything we want in streams of characters, but
> it creates a lot of work that wouldn't be necessary in a structured
> environment,

Stop right there. Can you explain to me how a "structured environment"
that can be (for example) transmitted over the wire can have properties
that a stream of characters can't?

> and in practice doesn't get done for anything except for
> the most widely used languages for the biggest value items. For
> example it's just much more difficult to compose domains in a text
> environment which is probably why in practice using text-based systems
> we haven't done it much and he made the complaint he did about
> codebases. Doxygen is an example where we have done it because one
> domain luckily fit inside the other relatively well (because in a
> comment in a program you can put whatever you want) and the value of
> doing so is particularly high, but even Doxygen has an awkwardness and
> set of limitations imposed by the text environment.

I think you are confusing levels. Specifically, if we want to do cool
thing X on some structure S which has a concrete (e.g. storable on disk,
etc.) representation R, you need the API of S to be sufficiently high
level so that the gap between X and S isn't dauntingly large.

That we agree on.

However, laying the blame on R is...mistaken. Ultimately, R is just
going to be some pattern of 0s and 1s, regardless of what you do, and
all the fun (work) is going to be in building the API for S on top of
them.

Now, some features of S's API are going to be more useful, and or less
work, so likely in any real situation they'll get implemented first, or
be more common, or be more standardized/portable, etc. And other
features will be harder to get right, less useful, etc. and their
quality will be correspondingly lower. Xs that depend on these features
are therefore going to be harder to write, regardless of R.

Look at it this way: Suppose we split S into two parts, an upper SHL
that provides all the cool functionality you want to implement X1, X2,
etc. to your heart's content, and a lower SLL that deals with the
concrete set of bits R. In between, we place the idea "structured"
representation, in all its glory.

I'm arguing that SLL is, at this point "known art" and does not present
any real difficulty to implement _except_ to the extent that there isn't
a clear consensus on what the nirvana structure should actually look
like in its grand glory. Instead, people just implement what they need,
and skip the other parts.

SHL, however, is hard to define, not to mention implement, and designing
a good, clear, stable API for it is presently beyond our collective
abilities. That level is where the problems are, and changing the
concrete representation isn't going to help.

-- Markus

markus

unread,

Apr 17, 2012, 3:51:18 PM4/17/12

to pdx...@googlegroups.com

It sounds as if XML would satisfy all of your requirements (provided you
had tooling on top of it that let you ignore the ascii underpinning)--
oops, reading down it looks like you eventually said that yourself. :)

> That's only true if you want backward compat. Like you said we have to
> rewrite all the tools (or at least the persistence layer of them), so
> folks that are working w/ text can continue to do so, and folks that
> are starting new projects can use this. Backwards compat could
> certainly be done where it was necessary, but I certainly don't see
> backwards compat as a compelling reason it can't be done. People are
> using lots of technologies today that are relatively new and weren't
> backwards compatible.

I think the real issue is backwards compatibility between version n and
version n+m of your representation. (This used to be known as "The
Animal Farm problem"* and n=3,m=1 is the traditional place for it to
become insuperable).

The standard answer is, of course, to get the representation exactly
right on the first go, or at least before anyone commits more resources
than can conveniently be trashed to it. :)

-- Markus

* From the novel, in which animals set out to reinvent civilization, but
"do it right this time" By version 3 (they aren't actually numbered in
the book) they have succeeded it reproducing civilization, including all
its faults.

Jake Brownson

unread,

Apr 17, 2012, 4:02:11 PM4/17/12

to pdx...@googlegroups.com

> I think the real issue is backwards compatibility between version n and
> version n+m of your representation. (This used to be known as "The
> Animal Farm problem"* and n=3,m=1 is the traditional place for it to
> become insuperable).

> The standard answer is, of course, to get the representation exactly
> right on the first go, or at least before anyone commits more resources
> than can conveniently be trashed to it. :)

Very true, which is why I think it's important to have the graph
format be as simple and general as possible (and as you say, as likely
to be reasonably close to "right" on the first go, though even ASCII
eventually needed Unicode though I'm not attempting to start any
ASCII/Unicode wars here), and make no assumptions about the content
(just as ASCII doesn't), just store a graph of strongly identified
nodes. Start with a data layer that has no goal other than letting you
define nodes connected together and build more meaningful things on
top of that by interpreting the meaning of certain strong identities.
It will be okay to have "invalid" structures in the graph. If need be
you can always resort to a more general projection to see exactly
what's going on and make changes, then go back to your domain specific
projection which makes it difficult/impossible to create "invalid"
structures. IDW and MPS both have data models like this (though they
both focus more on representing trees, with some graph like links).

markus

unread,

Apr 17, 2012, 4:51:57 PM4/17/12

to pdx...@googlegroups.com

> Very true, which is why I think it's important to have the graph
> format be as simple and general as possible (and as you say, as likely
> to be reasonably close to "right" on the first go, though even ASCII
> eventually needed Unicode though I'm not attempting to start any
> ASCII/Unicode wars here), and make no assumptions about the content
> (just as ASCII doesn't), just store a graph of strongly identified
> nodes.

So there are a number of formats that do what you are describing, with
all sorts of differing semantics. Some questions that arise, and which
your simple-and-general format will need to address:

* Can nodes be labeled, in addition to their identity?
* Are there different types of nodes?
* If so, can nodes have more than one type?
* Can edges be labeled?
* Do edges have identity?
* Are there different types of edges?
* Are edges directed?
* Can an edge's head be the same as its tail?
* Must edges only link nodes, or can they link edges?
* Can an edge start/end at itself?
* Must the graph be connected?
* Can there be redundant edges?
* If things have types, are types represented in the graph, or
external to it?
* What are the semantics of the labels? Are they text, typed
data, etc.?
* Can nodes be used to label things?
* Can edges be used to label things?
* Can subgraphs be used to label things?
* Can subgraphs be used as nodes?
* Can nodes have content?
* Can nodes contain subgraphs?
* If nodes can contain other parts of the graph, are there nesting
rules? If so, how are they enforced?
* What sort of operations are supported on the graphs?
* Assuming that there are operations to visit all nodes/edges, are
they stable (e.g. does adding/removing a node cause the order in
which other nodes are visited to change)?
* Does the storage format cache information about the graph to
optimize common operations (and is this information saved?)
* Are concurrent operations supported?
* Can graphs be merged, unified, extracted, etc.?

And so on and so forth. This is just a small fraction of the actual
list of issues. :)

Now, you can dodge any of these by saying "Nope, not going to do it; I
said 'simple,' remember?" but you'll likely be stepping away from
'general' if you do so. The normal way to squeak out of it is to do
something like the PL "magic comment" trick; say they have text labels
or some such and, if the format matches some pattern, than they "really"
have some extra semantics that your storage layer doesn't care about --
but you're stepping away from most of the claimed advantages of having a
rich data structure if you do so.

The text file had to make some comparable choices before becoming
reasonably standardized (and the process was not nearly as painless as
it may seem in retrospect) but had the advantage that much of the heavy
lifting had already been done by previous waves of innovation
(principally the moveable type printing press, the telegraph, and the
typewriter). Graphs haven't had that advantage yet (if anyone sees a
graph typewriter on e-bay, let me know; I'll pay a finder's fee) so the
road will likely be rockier.

-- Markus

Jake Brownson

unread,

Apr 17, 2012, 5:17:10 PM4/17/12

to pdx...@googlegroups.com

> Except, of course, that I _don't_ think it would be great to edit HTML
> in a "projectional" way, any more than I think it would be great to do
> geometry by eyeballing diagrams, or algebra by poking at graphs. When I
> shop for sheet music to learn a new piece, I don't ever think "Oh look,
> I can get a projectional form on CD and work from that!"

Don't confuse projectional/structured w/ WYSIWYG. I think it's be
great to have a projection of "Structured HTML" (which by the way is
probably semantically equivalent to HTML. HTML is a fairly well
defined and honed domain, not trying to reinvent HTML, just the way we
edit it) that is what is expected to be displayed in the browser. Not
all projections need to be fully editable, or even editable at all.
That particular projection might allow limited edits that are
reasonable to do in that view (perhaps editing blocks of text for
example), but you could switch to a projection that looks more like
the C# editor I linked when doing the bulk of the work, etc. I'm not
opposed to having text printed on the screen, just opposed to editing
giant chunks of it all at once. I do think once we have a system like
this in place we'll start to break away from simply having text on the
screen and start finding great new ways to projection the structures
we're manipulating. Jonathan Edwards has done some really interesting
thinking on what's possible once you have structured editing ex:
http://subtextual.org/subtext2.html.

> Stop right there. Can you explain to me how a "structured environment"
> that can be (for example) transmitted over the wire can have properties
> that a stream of characters can't?

I think I'm saying the opposite. You can accomplish anything you could
w/ structure with a stream of characters. You could always just parse
the string of characters to build the structure and then you're in the
same spot. It's just a lot more work because you need to define
unambiguous representations for the structure, parse it, pretty print
it (making sure you never produce invalid stuff) back and forth, and
you need to do it for each language you want to operate with. If we
had common underlying graph structure the
serialization/deserialization is done once and then we never worry
about it again, one less thing to reinvent every time we want to make
a new language.

> However, laying the blame on R is...mistaken. Ultimately, R is just
> going to be some pattern of 0s and 1s, regardless of what you do, and
> all the fun (work) is going to be in building the API for S on top of
> them.

> Now, some features of S's API are going to be more useful, and or less
> work, so likely in any real situation they'll get implemented first, or
> be more common, or be more standardized/portable, etc. And other
> features will be harder to get right, less useful, etc. and their
> quality will be correspondingly lower. Xs that depend on these features
> are therefore going to be harder to write, regardless of R.

> Look at it this way: Suppose we split S into two parts, an upper SHL
> that provides all the cool functionality you want to implement X1, X2,
> etc. to your heart's content, and a lower SLL that deals with the
> concrete set of bits R. In between, we place the idea "structured"
> representation, in all its glory.

> I'm arguing that SLL is, at this point "known art" and does not present
> any real difficulty to implement _except_ to the extent that there isn't
> a clear consensus on what the nirvana structure should actually look
> like in its grand glory. Instead, people just implement what they need,
> and skip the other parts.

> SHL, however, is hard to define, not to mention implement, and designing
> a good, clear, stable API for it is presently beyond our collective
> abilities. That level is where the problems are, and changing the
> concrete representation isn't going to help.

I think we actually agree the whole way down here unless I'm getting
the wrong message from all of this. It's very true there's no
consensus about SLL, but I'm not even seeing very many proposals
beyond MPS and IDW (if you count IDW's as a proposal since it's not
available). I think a rich set of proposals is the first step in
moving toward consensus. Most things impose a domain on the structure
(RDF) and aren't really what I'd call SLL, but SLL+some SHL. I agree
that SHL is where all the interesting stuff happens, so let's solve R
and SLL once and focus our efforts on the things that are specific to
the problem at hand in SHL.

Jake Brownson

unread,

Apr 17, 2012, 5:51:30 PM4/17/12

to pdx...@googlegroups.com

Great questions, ones I have answers for. The model I have in mind
isn't as fuzzy as you might think (I know I haven't said anything to
really indicate otherwise until now, other than pointing out that
there exist two functioning examples).

> * Can nodes be labeled, in addition to their identity?

Not at the data model level, they're just identified. The notion of
labeling is handled at the next level using the graph structure. It's
all just graph data.

> * Are there different types of nodes?

Nope, but there are different types of strong identities. The job of
an identity is to stand for an "idea"/"thing" and to identify the node
that represents the number 1 we use an int identity. To identify the
node that represents the idea of a Car we use a GUID. If we want a
node that represents a Wheel we represent it as a guid and have an
edge pointing to Car's node (as identified by its GUID) that has an
identity which is representing the idea of that relationship. To
attach a name to the node representing the idea of a Car we add an
edge from the Car node to a node representing the string "Car" which
has a string identity. We might also have an edge with the name
identity going from the node representing Car to a node representing
an icon that can be used to convey the idea of a Car.

> * If so, can nodes have more than one type?

n/a

> * Can edges be labeled?

Edges are simply keyed by identity and point at another node.

> * Do edges have identity?

No, but they're keyed by identity on their owning nodes

> * Are there different types of edges?

No

> * Are edges directed?
Yes

> * Can an edge's head be the same as its tail?

Yes

> * Must edges only link nodes, or can they link edges?

Nodes only

> * Can an edge start/end at itself?

n/a

> * Must the graph be connected?

You start processing at a root node that represents the "whole world",
so unconnected nodes are fine but won't be discoverable. Nodes might
be connected only through metadata, or things pointing at them as part
of the "editor" they're contained in.

> * Can there be redundant edges?

I think this will require some experimentation and is a question I
haven't settled on yet. I actually said it's a multi set keyed on
identity. I probably should have said multipmap keyed on identity
(meaning redundancy is okay), but I think it'd be worth keeping an eye
out to see if it's necessary to support it.

> * If things have types, are types represented in the graph, or
> external to it?

Types are handled at a higher meta level. As mentioned a schema
language should be part of the spec (that is defined using itself, and
lives on top of the graph data model and doesn't restrict what is
_possible_ to store in the graph, only what the editor will/won't
underline red).

> * What are the semantics of the labels? Are they text, typed
> data, etc.?