Sacred Cows part 2

34 views
Skip to first unread message

Jake Brownson

unread,
Apr 16, 2012, 1:34:55 PM4/16/12
to pdx...@googlegroups.com
Eric shared the Sacred Cows article with us at the last meeting. Being
a huge advocate for point #1 (don't store source as ASCII) I
immediately subscribed to the RSS feed. It seems he's gotten a lot of
controversial feedback on just that point and has done a second post
expanding on it. Worth the read IMHO.

http://joshondesign.com/2012/03/14/sacred-cows-2

Corey Hunt

unread,
Apr 16, 2012, 4:14:54 PM4/16/12
to pdx...@googlegroups.com
Agreed. That was worth the read. Subscribed as well.


--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.


Bart Massey

unread,
Apr 17, 2012, 3:29:38 AM4/17/12
to pdx...@googlegroups.com
Insert famous Spinoza quote here.

OK, I can't just let it go at that. But really: How many of these ideas do you think us old folks have not tried already? And I don't mean "kind of tried" here, I mean gave the full-out effort to?

For example, storing "source code" as some kind of data structure? Been there, done that. Lots. The classic example is Smalltalk, which Josh should know from his internship at PARC. Sure, we kept the text around: it's a good idea to do so for a variety of reasons. But all the tools worked on an internal representation very much like what the article describes. Ya know what? Most of the claimed benefits actually happened to some degree. The smart tools allowed cool custom refactoring, exposed visual elements of the code structure with a minimum of effort, etc.

The problems were real too, though. For example, turns out that non-ASCII representations are fragile--go figure. Change the language representation slightly, and everything else changes a lot. There's no limping along patching things by hand until the tools catch up, either. All the tools have to change with the representation: flag day. And the tools have to accept old representations, because the representations are hard to upgrade: backward compatibility hell.

Also turns out that writing simple tools against an ASCII representation, given a reasonable text-munging language with regexp support, is much easier than writing those tools against the structured representation. There tended to be one class of tools in structured systems: epic. Anything that wasn't worth months of effort to get built out just didn't happen at all. Go look at how much of C's structure started out as source-text-munging hacks (e.g. C++ name mangling)---and how many (e.g. CPP) still are. The good hacks got epic. The bad hacks got killed. This kind of incremental design is usually just better.

Another dirty secret was that the magic internal representation didn't turn out to be necessary. Modern PL implementations can lex, parse and structure tens or even hundreds of thousands of ASCII lines per second. If you want some "special" program representation for a tool, you can just go parse everything into it, do your business, and get out. Take a look at the Inform 7 compiler: Whole-program translation of syntactically wacky ASCII text every time you want to run your game. Happens so fast you don't even notice until the games get really, really, really big. Look at Sparse, and how fast it can do whole-program analysis of the Linux kernel. Etc.

So yeah, feel free to point out ways in which the PL design community is failing. Lord knows I could list some of my own. The problem is that if you don't learn your history, I may be condemned to repeat it. I hate that.

--Bart the Cranky Old Man

markus

unread,
Apr 17, 2012, 11:30:38 AM4/17/12
to pdx...@googlegroups.com

> Insert famous Spinoza quote here.
> [snip]

> --Bart the Cranky Old Man

Thank you. That was so much more balanced and informative than what I
was planning to write, I'm glad I procrastinated*.

-- Markus

* Mostly, I was dithering between two jujitsu attacks. Either 1)
agreeing and then using his arguments to say that we should ditch html
and just port the whole web to MS Word's "doc" file format, or 2)
agreeing with his core point that storing a complex structure as a
series of characters necessarily/unavoidably lost information, and then
exploring how we could save/transmit/do anything at all with programs
without such serialization. For the second I was thinking of drifting
into wide eyed Penrose-oid "collapse the wave function" analogies.

Bart's response was much better.

Jake Brownson

unread,
Apr 17, 2012, 1:49:53 PM4/17/12
to pdx...@googlegroups.com
> OK, I can't just let it go at that. But really: How many of these ideas do
> you think us old folks have not tried already? And I don't mean "kind of
> tried" here, I mean gave the full-out effort to?

I totally respect that, and it certainly gives me pause because I
certainly don't think I'm smarter than any of those people, and I'd
always love to read another paper I haven't read yet (I'm sure there
are a ton out there I haven't), but I'm sure you understand what that
sounds like to someone who has seen two functioning structured editing
systems, has enjoyed using both, but believes he sees where they've
gone wrong (and the biggest reasons aren't interesting/hard technical
ones). I'm sure folks applied that argument to a lot of the hard
problems that have eventually been solved over the years (if I pick
one I'll sound especially pompous :p). That certainly doesn't prove
that every time someone applies that argument it's invalid of course,
but it's one that I think shouldn't stop folks from trying to find new
approaches.

> For example, storing "source code" as some kind of data structure? Been
> there, done that. Lots. The classic example is Smalltalk, which Josh should
> know from his internship at PARC. Sure, we kept the text around: it's a good
> idea to do so for a variety of reasons. But all the tools worked on an
> internal representation very much like what the article describes. Ya know
> what? Most of the claimed benefits actually happened to some degree. The
> smart tools allowed cool custom refactoring, exposed visual elements of the
> code structure with a minimum of effort, etc.

Smalltalk fell quite short of the whole structured editing experience.
It was structured editing only down to the methods which is where it
really starts to matter. It may have had the structure available to
tools making for famously great refactoring, etc, but the user's
experience of editing the code was still textual. We already have that
capability in most editors by putting each method in a separate file
if we choose (that is to say the editing experience, not the
refactoring coolness).

As I understand it, Smalltalk's non-text data store formats were
Smalltalk specific. In order to see them or edit them in a reasonable
way you need an editor that is intimately familiar with smalltalk.
Following that lead we would need different sets of editors and tools
for each language/environment which is certainly not practical.

To make this work we need a standardize-able format that stores a
graph structure with strong identities that isn't tied to one specific
domain. ASCII/Unicode aren't tied to a specific domain which has
enabled us to use them to store so much great structure, but they
don't model the structure at all. It's a model that doesn't fit the
way we're using it for programming. When it didn't fit the way it was
being used to create paper documents we wrote structured editors
called word processors for our users, but we don't follow the same
design principles for ourselves.

Please correct me if I'm wrong, because I don't know this for a fact
at all, but my impression is that Smalltalk wasn't testing the idea of
doing a general structured editing environment, but was testing making
Smalltalk structured. It seems to have been relatively successful at
doing that. I think it goes to show that we should find a way to work
in a structured environment all the time instead of just with
Smalltalk.

> The problems were real too, though. For example, turns out that non-ASCII
> representations are fragile--go figure. Change the language representation
> slightly, and everything else changes a lot.

The basic idea of the model I'd like to see is that each node in the
graph has a strong identity (guid) and a multiset of edges keyed by a
strong identity. The specification would also include a limited set of
DSLs including a schema DSL, and a basic projection DSL. The schema
DSL would allow defining an interesting structure in the graph, and
the projection DSL would describe how a specific graph should be
displayed and edited. There would be general projections that could
display any graph (not just the bubble and line ones, but textish tree
spanning ones w/ indentation/expand collapse, etc) if a domain
specific projection isn't available, but it should be relatively easy
to make projections for even obscure domains widely available with
today's Internet. It's likely that more powerful schema and projection
languages would be developed, become popular, and be standardized as
well.

The graph could store any representation, the editor could display it,
and could test it against a schema and red-underline things that don't
fit. A type checking DSL could be used to add even more power to the
schema DSL (see MPS for a working example).

> There's no limping along
> patching things by hand until the tools catch up, either. All the tools have
> to change with the representation: flag day.

Very true, but most specific tools will be easier to write since we
don't have to deal w/ all the lexing and parsing and can go directly
to the AST. Version control can be much better as mentioned in the
article. This is certainly a problem, but not a compelling reason it
can't be done.

Though not entirely true. MPS for example still uses git for version
control. It also uses all the existing Java infrastructure for
executing the stuff you edit by generating Java source (though I never
have to deal w/ it as a user). It persists its data in xml and has a
way to read the git diffs to convert it to a nice UI for doing merges.

> And the tools have to accept
> old representations, because the representations are hard to upgrade:
> backward compatibility hell.

That's only true if you want backward compat. Like you said we have to
rewrite all the tools (or at least the persistence layer of them), so
folks that are working w/ text can continue to do so, and folks that
are starting new projects can use this. Backwards compat could
certainly be done where it was necessary, but I certainly don't see
backwards compat as a compelling reason it can't be done. People are
using lots of technologies today that are relatively new and weren't
backwards compatible.

It also depends a bit on exactly what you mean by backwards
compatibility. If you created a C domain for example you could still
link against old C code. If you integrated w/ a modern C++ compiler
that exposes the AST like LLVM you could even load old C code
relatively easily. The problem of course is all of the backwards
compat work is language-specific, but it can be done as
necessary/desired and even if it isn't I don't see it as a reason not
to move forward.

> Also turns out that writing simple tools against an ASCII representation,
> given a reasonable text-munging language with regexp support, is much easier
> than writing those tools against the structured representation. There tended
> to be one class of tools in structured systems: epic. Anything that wasn't
> worth months of effort to get built out just didn't happen at all. Go look
> at how much of C's structure started out as source-text-munging hacks (e.g.
> C++ name mangling)---and how many (e.g. CPP) still are. The good hacks got
> epic. The bad hacks got killed. This kind of incremental design is usually
> just better.

I'd suggest giving MPS a go. It might be as or a little more difficult
to go from an existing text thing to a structured thing (see backwards
compat above), but extending a structured language is massively
easier. At SPLASH conference this last year there were several
projects proposing very complicated solutions to the idea of extending
text languages. You need to worry about ambiguity while retaining ease
of use for the user, and good luck trying to compose language
extensions. In MPS it's super easy as it would be in any similar
structured system. MPS itself is of course an epic solution, but so
are most individual text editors worth programming in. Once you have
MPS (or something like it) each specific solution isn't epic. Maybe
I'm not understanding the point here but I haven't found this to be
true at all.

> Another dirty secret was that the magic internal representation didn't turn
> out to be necessary. Modern PL implementations can lex, parse and structure
> tens or even hundreds of thousands of ASCII lines per second. If you want
> some "special" program representation for a tool, you can just go parse
> everything into it, do your business, and get out. Take a look at the Inform
> 7 compiler: Whole-program translation of syntactically wacky ASCII text
> every time you want to run your game. Happens so fast you don't even notice
> until the games get really, really, really big. Look at Sparse, and how fast
> it can do whole-program analysis of the Linux kernel. Etc.

This is totally true. I wouldn't hate using a system that is totally
structured but stores the data back in source format. In fact many
IDEs are slowly slowly moving in that direction (but have only
scratched the surface). Even syntax highlighting is a small example of
it. I just find it silly to go to that extra work to do so. Since I'm
never (or rather wouldn't be) looking at the "insides" of the file on
disk in a hex editor or a text editor I couldn't care less what the
1's and 0's are, but I'm not going to come up w/ an unambiguous text
representation for every language I work with, make a parser for it
and a pretty printer for it when I'll never see the results of it. I
wouldn't even strongly object to storing the graph structure I
mentioned earlier in some XML like thing, I just don't think there's a
compelling argument to do it when it would likely be more compact and
faster to store it in a more optimal way. I know you never said
otherwise, but it was a realization for me when I realized that a text
file is binary file. It's not like we have a bunch of letters etched
on the disk. We (or at least I) have been trained to think of text
files as a different thing, there's a special flag when you open them,
etc, but it's just a highly standardized binary format. There's this
sense that if you _really_ want to see what's in the file you open it
in a text editor, but I could have just as easily been trained to open
it in a graph editor and use a standard projection for the data.

I'm not terribly personally interested in the argument about
efficiency of parsing, computers are fast and loading/editing in
either world is/would be sufficiently fast, but I do know that when
XCode starts its indexing run one on my moderately sized projects one
of my cores is pegged for a couple mins. Maybe that doesn't slow down
my computer noticeably but it does delay my autocomplete things (which
XCode also sucks at btw, not holding it up as a poster child, but
maybe it wouldn't be so bad if it didn't have to re-solve all the
problems of pulling structure out of text). Though I'm not familiar
with it I'm sure Sparse is a fantastically fast tool, but I bet a lot
of hard work went in to making it so for one specific language, and
it's not really useful to me since that work apparently hasn't made it
into any of the tools I use since they're not fast.

> So yeah, feel free to point out ways in which the PL design community is
> failing. Lord knows I could list some of my own. The problem is that if you
> don't learn your history, I may be condemned to repeat it. I hate that.

Hehe, well said. I'd love to continue the conversation w/ you in
person (is it worth doing at a pdxfunc meeting? it's not really very
functional-specific) and get more perspective on the history of this
stuff. To be honest I'm certainly weaker there than I should be. I'd
also love it if you could convince me that we'll still be programming
in text editors in 20-30 years and it's really the final solution so I
can stop thinking about it so much and just get back to work ;).

Jake

Jake Brownson

unread,
Apr 17, 2012, 1:59:25 PM4/17/12
to pdx...@googlegroups.com
> 1) agreeing and then using his arguments to say that we should ditch html
> and just port the whole web to MS Word's "doc" file format

I'd argue that HTML and "doc" address fundamentally different domains.
I think it'd be great to edit HTML in a structured projectional way
too, but it's actually a different domain than "doc" is and would have
a different DSL associated with it (like HTML). There's significant
overlap in the two domains, but only in the simple parts.

> 2) agreeing with his core point that storing a complex structure as a
> series of characters necessarily/unavoidably lost information, and then
> exploring how we could save/transmit/do anything at all with programs
> without such serialization.  For the second I was thinking of drifting
> into wide eyed Penrose-oid "collapse the wave function" analogies.

Not sure I completely understand this point, but it's totally true
that we can represent anything we want in streams of characters, but
it creates a lot of work that wouldn't be necessary in a structured
environment, and in practice doesn't get done for anything except for
the most widely used languages for the biggest value items. For
example it's just much more difficult to compose domains in a text
environment which is probably why in practice using text-based systems
we haven't done it much and he made the complaint he did about
codebases. Doxygen is an example where we have done it because one
domain luckily fit inside the other relatively well (because in a
comment in a program you can put whatever you want) and the value of
doing so is particularly high, but even Doxygen has an awkwardness and
set of limitations imposed by the text environment.

Jake Brownson

unread,
Apr 17, 2012, 2:18:48 PM4/17/12
to pdx...@googlegroups.com
Really excited to see folks' opinions on this topic. Thought I'd add another link to the table. I think it's difficult to picture what structured editing looks like without seeing an example of it. Here's an example somebody did of a prototype of a C# structured editor. It's a super fluid editing experience (many folks seem to think it would be more awkward). I've used two structured editors extensively (MPS and ISC's IDW) and found them to be quite nice to enter information in compared to a text editor. To me it feels way less messy.


This example falls short in a couple of ways imho. It's built specifically for C# (the solution for this is a language workbench ala MPS or IDW), and it's not complete (when you get down to statements it's just a text box again, language workbenches make it easy to define structured editing for the whole language), but it gives a good taste of what it's like to edit structure.

You can check out MPS for more examples. MPS tries too hard to continue to feel like text imho, but I really enjoy working in it. I love working with the nearly perfect degree of freedom it offers. I'm not constantly checking to make sure there aren't extra whitespaces at the end of a line, or getting the brackets just the way I like them, but encountering code someone else wrote and going "grr". Also just to be clear about MPS. I think the core ideas of it are really solid, but the implementation leaves a bit to be desired. The problems I have with the implementation are not the interesting/difficult problems inherent in this approach (which they mostly do well), but are more straightforward software engineering issues that have beens solved elsewhere which is why I continue to have hope. MPS is open source btw.


I gave a half-demo of MPS awhile back at the meeting (I was focusing more on what I had done with it). If anybody is interested in seeing more about it I could do a presentation focused on it. I'm not sure how functional-specific such a presentation would be though.

In the end I don't expect just anybody to be convinced without having a fantastic example of a general purpose structure editor in front of them. The fact that there isn't one is a fantastic argument and one I don't yet have a true answer for, but I think it'd be great if more folks were thinking about this and working on it.

Jake

markus

unread,
Apr 17, 2012, 3:35:40 PM4/17/12
to pdx...@googlegroups.com
On Tue, 2012-04-17 at 10:59 -0700, Jake Brownson wrote:
> > 1) agreeing and then using his arguments to say that we should ditch html
> > and just port the whole web to MS Word's "doc" file format
>
> I'd argue that HTML and "doc" address fundamentally different domains.
> I think it'd be great to edit HTML in a structured projectional way
> too, but it's actually a different domain than "doc" is and would have
> a different DSL associated with it (like HTML). There's significant
> overlap in the two domains, but only in the simple parts.

Which was sort of my point. :)

Except, of course, that I _don't_ think it would be great to edit HTML
in a "projectional" way, any more than I think it would be great to do
geometry by eyeballing diagrams, or algebra by poking at graphs. When I
shop for sheet music to learn a new piece, I don't ever think "Oh look,
I can get a projectional form on CD and work from that!"

> > 2) agreeing with his core point that storing a complex structure as a
> > series of characters necessarily/unavoidably lost information, and then
> > exploring how we could save/transmit/do anything at all with programs
> > without such serialization. For the second I was thinking of drifting
> > into wide eyed Penrose-oid "collapse the wave function" analogies.
>
> Not sure I completely understand this point, but it's totally true
> that we can represent anything we want in streams of characters, but
> it creates a lot of work that wouldn't be necessary in a structured
> environment,

Stop right there. Can you explain to me how a "structured environment"
that can be (for example) transmitted over the wire can have properties
that a stream of characters can't?

> and in practice doesn't get done for anything except for
> the most widely used languages for the biggest value items. For
> example it's just much more difficult to compose domains in a text
> environment which is probably why in practice using text-based systems
> we haven't done it much and he made the complaint he did about
> codebases. Doxygen is an example where we have done it because one
> domain luckily fit inside the other relatively well (because in a
> comment in a program you can put whatever you want) and the value of
> doing so is particularly high, but even Doxygen has an awkwardness and
> set of limitations imposed by the text environment.

I think you are confusing levels. Specifically, if we want to do cool
thing X on some structure S which has a concrete (e.g. storable on disk,
etc.) representation R, you need the API of S to be sufficiently high
level so that the gap between X and S isn't dauntingly large.

That we agree on.

However, laying the blame on R is...mistaken. Ultimately, R is just
going to be some pattern of 0s and 1s, regardless of what you do, and
all the fun (work) is going to be in building the API for S on top of
them.

Now, some features of S's API are going to be more useful, and or less
work, so likely in any real situation they'll get implemented first, or
be more common, or be more standardized/portable, etc. And other
features will be harder to get right, less useful, etc. and their
quality will be correspondingly lower. Xs that depend on these features
are therefore going to be harder to write, regardless of R.

Look at it this way: Suppose we split S into two parts, an upper SHL
that provides all the cool functionality you want to implement X1, X2,
etc. to your heart's content, and a lower SLL that deals with the
concrete set of bits R. In between, we place the idea "structured"
representation, in all its glory.

I'm arguing that SLL is, at this point "known art" and does not present
any real difficulty to implement _except_ to the extent that there isn't
a clear consensus on what the nirvana structure should actually look
like in its grand glory. Instead, people just implement what they need,
and skip the other parts.

SHL, however, is hard to define, not to mention implement, and designing
a good, clear, stable API for it is presently beyond our collective
abilities. That level is where the problems are, and changing the
concrete representation isn't going to help.

-- Markus


markus

unread,
Apr 17, 2012, 3:51:18 PM4/17/12
to pdx...@googlegroups.com

It sounds as if XML would satisfy all of your requirements (provided you
had tooling on top of it that let you ignore the ascii underpinning)--
oops, reading down it looks like you eventually said that yourself. :)

> That's only true if you want backward compat. Like you said we have to
> rewrite all the tools (or at least the persistence layer of them), so
> folks that are working w/ text can continue to do so, and folks that
> are starting new projects can use this. Backwards compat could
> certainly be done where it was necessary, but I certainly don't see
> backwards compat as a compelling reason it can't be done. People are
> using lots of technologies today that are relatively new and weren't
> backwards compatible.

I think the real issue is backwards compatibility between version n and
version n+m of your representation. (This used to be known as "The
Animal Farm problem"* and n=3,m=1 is the traditional place for it to
become insuperable).

The standard answer is, of course, to get the representation exactly
right on the first go, or at least before anyone commits more resources
than can conveniently be trashed to it. :)

-- Markus

* From the novel, in which animals set out to reinvent civilization, but
"do it right this time" By version 3 (they aren't actually numbered in
the book) they have succeeded it reproducing civilization, including all
its faults.


Jake Brownson

unread,
Apr 17, 2012, 4:02:11 PM4/17/12
to pdx...@googlegroups.com
> I think the real issue is backwards compatibility between version n and
> version n+m of your representation.  (This used to be known as "The
> Animal Farm problem"* and n=3,m=1 is the traditional place for it to
> become insuperable).

> The standard answer is, of course, to get the representation exactly
> right on the first go, or at least before anyone commits more resources
> than can conveniently be trashed to it.  :)

Very true, which is why I think it's important to have the graph
format be as simple and general as possible (and as you say, as likely
to be reasonably close to "right" on the first go, though even ASCII
eventually needed Unicode though I'm not attempting to start any
ASCII/Unicode wars here), and make no assumptions about the content
(just as ASCII doesn't), just store a graph of strongly identified
nodes. Start with a data layer that has no goal other than letting you
define nodes connected together and build more meaningful things on
top of that by interpreting the meaning of certain strong identities.
It will be okay to have "invalid" structures in the graph. If need be
you can always resort to a more general projection to see exactly
what's going on and make changes, then go back to your domain specific
projection which makes it difficult/impossible to create "invalid"
structures. IDW and MPS both have data models like this (though they
both focus more on representing trees, with some graph like links).

markus

unread,
Apr 17, 2012, 4:51:57 PM4/17/12
to pdx...@googlegroups.com

> Very true, which is why I think it's important to have the graph
> format be as simple and general as possible (and as you say, as likely
> to be reasonably close to "right" on the first go, though even ASCII
> eventually needed Unicode though I'm not attempting to start any
> ASCII/Unicode wars here), and make no assumptions about the content
> (just as ASCII doesn't), just store a graph of strongly identified
> nodes.

So there are a number of formats that do what you are describing, with
all sorts of differing semantics. Some questions that arise, and which
your simple-and-general format will need to address:

* Can nodes be labeled, in addition to their identity?
* Are there different types of nodes?
* If so, can nodes have more than one type?
* Can edges be labeled?
* Do edges have identity?
* Are there different types of edges?
* Are edges directed?
* Can an edge's head be the same as its tail?
* Must edges only link nodes, or can they link edges?
* Can an edge start/end at itself?
* Must the graph be connected?
* Can there be redundant edges?
* If things have types, are types represented in the graph, or
external to it?
* What are the semantics of the labels? Are they text, typed
data, etc.?
* Can nodes be used to label things?
* Can edges be used to label things?
* Can subgraphs be used to label things?
* Can subgraphs be used as nodes?
* Can nodes have content?
* Can nodes contain subgraphs?
* If nodes can contain other parts of the graph, are there nesting
rules? If so, how are they enforced?
* What sort of operations are supported on the graphs?
* Assuming that there are operations to visit all nodes/edges, are
they stable (e.g. does adding/removing a node cause the order in
which other nodes are visited to change)?
* Does the storage format cache information about the graph to
optimize common operations (and is this information saved?)
* Are concurrent operations supported?
* Can graphs be merged, unified, extracted, etc.?

And so on and so forth. This is just a small fraction of the actual
list of issues. :)

Now, you can dodge any of these by saying "Nope, not going to do it; I
said 'simple,' remember?" but you'll likely be stepping away from
'general' if you do so. The normal way to squeak out of it is to do
something like the PL "magic comment" trick; say they have text labels
or some such and, if the format matches some pattern, than they "really"
have some extra semantics that your storage layer doesn't care about --
but you're stepping away from most of the claimed advantages of having a
rich data structure if you do so.

The text file had to make some comparable choices before becoming
reasonably standardized (and the process was not nearly as painless as
it may seem in retrospect) but had the advantage that much of the heavy
lifting had already been done by previous waves of innovation
(principally the moveable type printing press, the telegraph, and the
typewriter). Graphs haven't had that advantage yet (if anyone sees a
graph typewriter on e-bay, let me know; I'll pay a finder's fee) so the
road will likely be rockier.

-- Markus

Jake Brownson

unread,
Apr 17, 2012, 5:17:10 PM4/17/12
to pdx...@googlegroups.com
> Except, of course, that I _don't_ think it would be great to edit HTML
> in a "projectional" way, any more than I think it would be great to do
> geometry by eyeballing diagrams, or algebra by poking at graphs.  When I
> shop for sheet music to learn a new piece, I don't ever think "Oh look,
> I can get a projectional form on CD and work from that!"

Don't confuse projectional/structured w/ WYSIWYG. I think it's be
great to have a projection of "Structured HTML" (which by the way is
probably semantically equivalent to HTML. HTML is a fairly well
defined and honed domain, not trying to reinvent HTML, just the way we
edit it) that is what is expected to be displayed in the browser. Not
all projections need to be fully editable, or even editable at all.
That particular projection might allow limited edits that are
reasonable to do in that view (perhaps editing blocks of text for
example), but you could switch to a projection that looks more like
the C# editor I linked when doing the bulk of the work, etc. I'm not
opposed to having text printed on the screen, just opposed to editing
giant chunks of it all at once. I do think once we have a system like
this in place we'll start to break away from simply having text on the
screen and start finding great new ways to projection the structures
we're manipulating. Jonathan Edwards has done some really interesting
thinking on what's possible once you have structured editing ex:
http://subtextual.org/subtext2.html.

> Stop right there.  Can you explain to me how a "structured environment"
> that can be (for example) transmitted over the wire can have properties
> that a stream of characters can't?

I think I'm saying the opposite. You can accomplish anything you could
w/ structure with a stream of characters. You could always just parse
the string of characters to build the structure and then you're in the
same spot. It's just a lot more work because you need to define
unambiguous representations for the structure, parse it, pretty print
it (making sure you never produce invalid stuff) back and forth, and
you need to do it for each language you want to operate with. If we
had common underlying graph structure the
serialization/deserialization is done once and then we never worry
about it again, one less thing to reinvent every time we want to make
a new language.

> However, laying the blame on R is...mistaken.  Ultimately, R is just
> going to be some pattern of 0s and 1s, regardless of what you do, and
> all the fun (work) is going to be in building the API for S on top of
> them.

> Now, some features of S's API are going to be more useful, and or less
> work, so likely in any real situation they'll get implemented first, or
> be more common, or be more standardized/portable, etc.  And other
> features will be harder to get right, less useful, etc. and their
> quality will be correspondingly lower.  Xs that depend on these features
> are therefore going to be harder to write, regardless of R.

> Look at it this way:  Suppose we split S into two parts, an upper SHL
> that provides all the cool functionality you want to implement X1, X2,
> etc. to your heart's content, and a lower SLL that deals with the
> concrete set of bits R.  In between, we place the idea "structured"
> representation, in all its glory.

> I'm arguing that SLL is, at this point "known art" and does not present
> any real difficulty to implement _except_ to the extent that there isn't
> a clear consensus on what the nirvana structure should actually look
> like in its grand glory.  Instead, people just implement what they need,
> and skip the other parts.

> SHL, however, is hard to define, not to mention implement, and designing
> a good, clear, stable API for it is presently beyond our collective
> abilities.  That level is where the problems are, and changing the
> concrete representation isn't going to help.

I think we actually agree the whole way down here unless I'm getting
the wrong message from all of this. It's very true there's no
consensus about SLL, but I'm not even seeing very many proposals
beyond MPS and IDW (if you count IDW's as a proposal since it's not
available). I think a rich set of proposals is the first step in
moving toward consensus. Most things impose a domain on the structure
(RDF) and aren't really what I'd call SLL, but SLL+some SHL. I agree
that SHL is where all the interesting stuff happens, so let's solve R
and SLL once and focus our efforts on the things that are specific to
the problem at hand in SHL.

Jake Brownson

unread,
Apr 17, 2012, 5:51:30 PM4/17/12
to pdx...@googlegroups.com
Great questions, ones I have answers for. The model I have in mind
isn't as fuzzy as you might think (I know I haven't said anything to
really indicate otherwise until now, other than pointing out that
there exist two functioning examples).

>      * Can nodes be labeled, in addition to their identity?

Not at the data model level, they're just identified. The notion of
labeling is handled at the next level using the graph structure. It's
all just graph data.

>      * Are there different types of nodes?

Nope, but there are different types of strong identities. The job of
an identity is to stand for an "idea"/"thing" and to identify the node
that represents the number 1 we use an int identity. To identify the
node that represents the idea of a Car we use a GUID. If we want a
node that represents a Wheel we represent it as a guid and have an
edge pointing to Car's node (as identified by its GUID) that has an
identity which is representing the idea of that relationship. To
attach a name to the node representing the idea of a Car we add an
edge from the Car node to a node representing the string "Car" which
has a string identity. We might also have an edge with the name
identity going from the node representing Car to a node representing
an icon that can be used to convey the idea of a Car.

>      * If so, can nodes have more than one type?

n/a

>      * Can edges be labeled?

Edges are simply keyed by identity and point at another node.

>      * Do edges have identity?

No, but they're keyed by identity on their owning nodes

>      * Are there different types of edges?

No

>      * Are edges directed?
Yes

>      * Can an edge's head be the same as its tail?

Yes

>      * Must edges only link nodes, or can they link edges?

Nodes only

>      * Can an edge start/end at itself?

n/a

>      * Must the graph be connected?

You start processing at a root node that represents the "whole world",
so unconnected nodes are fine but won't be discoverable. Nodes might
be connected only through metadata, or things pointing at them as part
of the "editor" they're contained in.

>      * Can there be redundant edges?

I think this will require some experimentation and is a question I
haven't settled on yet. I actually said it's a multi set keyed on
identity. I probably should have said multipmap keyed on identity
(meaning redundancy is okay), but I think it'd be worth keeping an eye
out to see if it's necessary to support it.

>      * If things have types, are types represented in the graph, or
>        external to it?

Types are handled at a higher meta level. As mentioned a schema
language should be part of the spec (that is defined using itself, and
lives on top of the graph data model and doesn't restrict what is
_possible_ to store in the graph, only what the editor will/won't
underline red).

>      * What are the semantics of the labels?  Are they text, typed
>        data, etc.?

n/a (there's no notion of a "label" in the data model)

>      * Can nodes be used to label things?

n/a (there's no notion of a "label" in the data model)

>      * Can edges be used to label things?

n/a (there's no notion of a "label" in the data model)

>      * Can subgraphs be used to label things?

n/a (there's no notion of a "label" in the data model)

>      * Can subgraphs be used as nodes?

n/a (there's no notion of a "label" in the data model)

>      * Can nodes have content?

Nope, just edges

>      * Can nodes contain subgraphs?

well I'm not sure what "Contain" means here, but they can have an edge
to another [part of the] graph, nothing at the level of the data model
treats it as a subgraph in any special way.

>      * If nodes can contain other parts of the graph, are there nesting
>        rules?  If so, how are they enforced?

No nesting, just pointing. It's up to the higher meta levels to decide
what the identities mean.

>      * What sort of operations are supported on the graphs?

Add edge, remove edge, create new node, delete node (and its edges).

>      * Assuming that there are operations to visit all nodes/edges, are
>        they stable (e.g. does adding/removing a node cause the order in
>        which other nodes are visited to change)?

I think the best answer to that is the spec of the data model or even
the Schema and Projection DSLs that would be required will be agnostic
to this. The DSLs are declarative.

>      * Does the storage format cache information about the graph to
>        optimize common operations (and is this information saved?)

I think this is a good idea, and is one I plan to experiment with.
IDW's format does this. I don't believe MPS's does. I haven't specced
the specific binary format. The more interesting thing is the
semantics of the format at this point. I think it might be worth
leaving caching out of the general format to keep it simpler.

>      * Are concurrent operations supported?

Not part of the data model spec. Different editors of the graph can
make this choice. I think there is a huge potential for it.

>      * Can graphs be merged, unified, extracted, etc.?

Not part of the data model spec. Different editors of the graph can
make this choice. I think there is a huge potential for it.

> And so on and so forth.  This is just a small fraction of the actual
> list of issues.  :)

There are two examples out there that have answered all of these
questions and more, and I have answers for many of them which are in
part drawn from the answers of the other two and in part things I
think could be done better. I think the number of questions just shows
that we should be doing _more_ experimenting with these things to see
what the best answers are than less.

I realize that you might see some of the answers saying "not handled
at this meta level" to be handwaving or a punt, but MPS and IDW show
how this can work in practice. The data model simply stores a bunch of
strongly identified nodes and how they are related to each other, and
the meaning of that is built up through a layer of DSLs. I think
that's actually one of the keys to this thing is that the data model
shouldn't be at all interested in interpreting the meaning of the
connections and nodes its storing.

> Now, you can dodge any of these by saying "Nope, not going to do it; I
> said 'simple,' remember?" but you'll likely be stepping away from
> 'general' if you do so.  The normal way to squeak out of it is to do
> something like the PL "magic comment" trick; say they have text labels
> or some such and, if the format matches some pattern, than they "really"
> have some extra semantics that your storage layer doesn't care about --
> but you're stepping away from most of the claimed advantages of having a
> rich data structure if you do so.

I don't think any of the answers I've given are dodges or stepping
away from being general. If you need subgraphs in your language you
can interpret certain edge identities as pointing to subgraphs, etc...
don't worry about it in the data model. It's just responsible for
letting you navigate and manipulate the connected nodes. No magic
labels at the level of the graph data model.

> The text file had to make some comparable choices before becoming
> reasonably standardized (and the process was not nearly as painless as
> it may seem in retrospect) but had the advantage that much of the heavy
> lifting had already been done by previous waves of innovation
> (principally the moveable type printing press, the telegraph, and the
> typewriter).  Graphs haven't had that advantage yet (if anyone sees a
> graph typewriter on e-bay, let me know; I'll pay a finder's fee) so the
> road will likely be rockier.

I'm sure the process was awfully painful. How many control characters,
how international should we be, do we put in arrows and boxes and
things, a smiley face, etc. As I understand it there were a huge
number of different text encoding formats floating around making it a
pain for everyone before ASCII was standardized. I'd like to see the
same thing for graph formats. Let's produce a disgusting number of
them with different philosophies for a couple years and then take the
best choices from each and standardize one. So far I know of two I'd
put in this category, and one isn't publicly available. If there are
more I should be looking at please inform me.

(Great discussion, this is the best I've had on the topic!)

Jake

Patrick Logan

unread,
Apr 17, 2012, 5:56:04 PM4/17/12
to pdx...@googlegroups.com
The standard Smalltalk environment edits live objects in an image. The
most common tool for editing is the class browser, where there is a
text pane for editing class definitions and methods. So it's kind of a
hybrid in this discussion. It does keep text-oriented change sets,
etc. Although I have to admit I have not been able to keep up with the
latest messages, so I am not exactly sure where this topic is headed.

The main point of the Smalltalk environment is that the editors and
other tools co-exist in the image with the objects that they edit
(which may be the editors themselves), so there is not a lot of
difference between the editors and the other things in the image which
operate on the image.

Still it does not have the feel of Interlisp's structure editor per
se, Nor does it have the feel of the "intentional software" variety of
tools (though I only barely grasp what that is about.)

-Patrick

Patrick Logan

unread,
Apr 17, 2012, 6:14:07 PM4/17/12
to pdx...@googlegroups.com
Not sure I'll be able to catch up on the main thread. One other
interesting note about Smalltalk's editing live objects...

Changing the structure of a class in Smalltalk-80 (IIRC) creates a new
class and then sends #become: to instances of the previous class to
become instances of the new class, preserving as many of the existing
values as possible.

Gemstone Smalltalk, because it is distributed, multiuser, and has
potentially very large numbers of instances, allows (again IIRC)
instances of up to two versions of a class to co-exist. The older
instances can be told to migrate en masse (which I think stops
everything until the migration is complete) or incrementally.

Pretty tame stuff compared to whatever is going on in the main thread...

-Patrick

Nathan Collins

unread,
Apr 17, 2012, 7:50:25 PM4/17/12
to pdx...@googlegroups.com
On Tue, Apr 17, 2012 at 2:17 PM, Jake Brownson <ja...@brainiumstudios.com> wrote:
>> Except, of course, that I _don't_ think it would be great to edit HTML
>> in a "projectional" way, any more than I think it would be great to do
>> geometry by eyeballing diagrams, or algebra by poking at graphs.  When I
>> shop for sheet music to learn a new piece, I don't ever think "Oh look,
>> I can get a projectional form on CD and work from that!"
>
> Don't confuse projectional/structured w/ WYSIWYG. I think it's be
> great to have a projection of "Structured HTML" (which by the way is
> probably semantically equivalent to HTML. HTML is a fairly well
> defined and honed domain, not trying to reinvent HTML, just the way we
> edit it) that is what is expected to be displayed in the browser. Not
> all projections need to be fully editable, or even editable at all.
> That particular projection might allow limited edits that are
> reasonable to do in that view (perhaps editing blocks of text for
> example), but you could switch to a projection that looks more like
> the C# editor I linked when doing the bulk of the work, etc. I'm not
> opposed to having text printed on the screen, just opposed to editing
> giant chunks of it all at once. I do think once we have a system like
> this in place we'll start to break away from simply having text on the
> screen and start finding great new ways to projection the structures
> we're manipulating. Jonathan Edwards has done some really interesting
> thinking on what's possible once you have structured editing ex:
> http://subtextual.org/subtext2.html.

Maybe you'd be interested in lenses?

I think your "editable projections" are sometimes called lenses, a
small but active area of PL research at UPenn (Nate Foster, Benjamin
Pierce, and Daniel Wagner have worked in this area). Here's a short
summary from the Boomerang page [1]:

Boomerang is a programming language for writing lenses—well-behaved
bidirectional transformations—that operate on ad-hoc, textual data
formats. Every lens program, when read from left to right, describes
a function that maps an input to an output; when read from right to
left, the very same program describes a "backwards" function that
maps a modified output, together with the original input, back to a
modified input.

Lenses have been used to solve problems across a wide range of areas
in computing including: in data converters and synchronizers, in
parsers and pretty printers, in picklers and unpicklers, in
structure editors, in constraint maintainers for user interfaces, in
software model transformations, in schema evolution, in tools for
managing system configuration files, and in databases where they
provide updatable views.

The wikipedia article on bidirectional transformations [2] has a few
more lenses examples.

Some make-Haskell-records-suck-less libraries [3] are lens based.
Such lenses are much simpler than what you'd need for your structured
editor of course.

Cheers,

-nathan

[1] http://www.seas.upenn.edu/~harmony/
[2] http://en.wikipedia.org/wiki/Bidirectional_transformation
[3] http://stackoverflow.com/questions/5767129/lenses-fclabels-data-accessor-which-library-for-structure-access-and-mutatio

Patrick Logan

unread,
Apr 17, 2012, 7:56:26 PM4/17/12
to pdx...@googlegroups.com

Lenses - there are implementations for scala and I just started using a modest one for Java too. Very neat.

Jake Brownson

unread,
Apr 17, 2012, 7:58:14 PM4/17/12
to pdx...@googlegroups.com
I have quite limited experience with lenses, so don't take anything
I'm saying here about them as authoritative, but I think they're great
for limited things like doing "property" getters and setters for data
structures in functional systems, but I don't think bidirectional
transformations are a good solution to general problems. There are a
lot of transformations that are lossy and I don't see how
bidirectional transformations would work well with them. I think a
structured editor needs two sets of transformations, one set that
transforms from the input to the projection on screen, and another set
that goes in the other direction translating editing commands back
until they are consumed and acted on (or discarded for that matter).
I'll check out the things you've linked for sure though. I should
really play w/ lenses more.

> --
> You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> To post to this group, send email to pdx...@googlegroups.com.
> To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
>

--
Jake Brownson
Cofounder
Brainium Studios
Cell: 503.349.4841

Nathan Collins

unread,
Apr 17, 2012, 8:54:41 PM4/17/12
to pdx...@googlegroups.com
On Tue, Apr 17, 2012 at 4:58 PM, Jake Brownson <ja...@brainiumstudios.com> wrote:
> I have quite limited experience with lenses, so don't take anything
> I'm saying here about them as authoritative, but I think they're great
> for limited things like doing "property" getters and setters for data
> structures in functional systems, but I don't think bidirectional
> transformations are a good solution to general problems. There are a
> lot of transformations that are lossy and I don't see how
> bidirectional transformations would work well with them. I think a
> structured editor needs two sets of transformations, one set that
> transforms from the input to the projection on screen, and another set
> that goes in the other direction translating editing commands back
> until they are consumed and acted on (or discarded for that matter).
> I'll check out the things you've linked for sure though. I should
> really play w/ lenses more.

Yes, the getter/setter examples are much simpler than what you'd need.
But I think dealing with *lossy* transformations is the main problem
of interest in lens research. It looks like they call such lenses
"edit lenses" or "stateful lenses" [4].

I think the idea is that when you write your "two sets of
transformations" there's a good chance that

1. the two sets are closely related, and so you're duplicating a lot
of effort and creating a maintenance problem. With a principled
approach, you may be able to write the common code only once.

2. one direction of the transformation is a projection, and so by
keeping track of the input to the projection, in addition to the
edits on the (lossy) output, you may be able to invert the
projection and get an updated version of the input. In symbols,
if x is in the rich representation and

y = project x
y' = edit y
x' = unproject x y'

then

project x' == y'

and the functions project and unproject are closely related.

In the case of projections (2), one direction may be completely
derivable from the other. More generally, each direction of the
transformation may perform some unique work which you must specify,
and each direction takes two arguments, an input and a "complement" of
lost data [4].

There's a recent paper about "edit lenses" with many references [4].
That paper is probably too technical for your needs -- it's about a
mathematical unification of earlier work on edit and state lenses --
but some of the references may be more practical. For example, they
reference [5]

This paper presents an application of bidirectional transformation
to the design and implementation of a novel editor supporting
interactive refinement in the development of structured
documents. The user performs a sequence of editing operations on a
view of the document, and the editor automatically derives an
efficient and reliable document source and a transformation that
produces the document view. The editor is unique in its
programmability, in the sense that transformations can be obtained
through editing operations. It uses the viewupdating technique
developed in the database community and a new bidirectional
transformation language that can describe not only the relationship
between the document source and its view, but also the data
dependency in the view.

I think this is exactly what you're trying to do. The question is,
how much of the work is practical enough that you can apply it?

Cheers,

-nathan

[4] /Edit Lenses/ http://dmwit.com/papers/201107EL.pdf

[5] /A Programmable Editor for Developing Structured Documents Based
on Bidirectional Transformations/
http://research.nii.ac.jp/~hu/pub/hosc07.pdf

Leif

unread,
Apr 17, 2012, 10:03:33 PM4/17/12
to pdxfunc
It's great to see such activity on pdxfunc!
I just wanted to add, that a problem with this is: unless everyone
else is working at the same level as you, they're going to be annoyed
by your reformatting of the surface syntax when you reserialize the
structure. Think formatting, diffs, VCS, etc.
-Leif

Leif

unread,
Apr 17, 2012, 10:09:45 PM4/17/12
to pdxfunc
If you're working with graphs, there's a variety of formats out there
already. One I'm familiar with is RDF, which answers "Yes" to most of
the above, though most all but node identity are implemented by
reifying into the graph. There's a lot of tools out there with
support for it, and a variety of serializations. There's databases,
query languages, reasoning engine, prolog-type rules to transform
structure, the whole shebang. Granted, a special-purpose form might
be more appropriate for a specific use case, but it'd be handy to have
an RDF view of your graph for re-use with existing tools.
-Leif

Leif

unread,
Apr 17, 2012, 10:11:29 PM4/17/12
to pdxfunc

On Apr 17, 5:54 pm, Nathan Collins <nathan.coll...@gmail.com> wrote:
> [4] /Edit Lenses/http://dmwit.com/papers/201107EL.pdf
>
> [5] /A Programmable Editor for Developing Structured Documents Based
>     on Bidirectional Transformations/
>    http://research.nii.ac.jp/~hu/pub/hosc07.pdf
>

I'd be extremely interested in hearing more about this at a future
PDXFunc!
>
>
>
>
>
>
> > On Tue, Apr 17, 2012 at 4:50 PM, Nathan Collins
> > <nathan.coll...@gmail.com> wrote:
> >> [3]http://stackoverflow.com/questions/5767129/lenses-fclabels-data-acces...
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> >> To post to this group, send email to pdx...@googlegroups.com.
> >> To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/pdxfunc?hl=en.

Arthur Peters

unread,
Apr 17, 2012, 10:16:26 PM4/17/12
to pdx...@googlegroups.com
If I understand lenses correctly they can actually address this. The formatting is a kind of lost information just like any other so it's part of the complement Nathan talked about. And in theory could be reintroduced into the new version after editing.

However I agree with you in general. Automated tools can really mess up formatting. But I guess this just adds a new requirement to a structured system. It has to add a way to encode all the useful information we usually encode using adhock code formatting conventions. It doesn't seem impossible but I suspect it would be hard to match the ease of use of text formatting.
 
-Arthur


--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.

Bart Massey

unread,
Apr 18, 2012, 1:43:03 AM4/18/12
to pdx...@googlegroups.com
Sadly, I can't make PDXFunc for the foreseeable future because of my teaching schedule. Maybe there will be another Zissou meeting soon we can poke at things at.

I'd estimate my response to your comments is about eight hours of thinking and  typing :-), and sadly I'm about to fly off to Germany for 10 days and have to get ready to go. I pretty clearly need to write something more; I'll try to get it onto my blog sometime soon, and will make sure to let people here know if / when I manage it.

Peace and joy.

--Bart

Jake Brownson

unread,
Apr 18, 2012, 1:47:04 PM4/18/12
to pdx...@googlegroups.com
>> [5] /A Programmable Editor for Developing Structured Documents Based
>>     on Bidirectional Transformations/
>>    http://research.nii.ac.jp/~hu/pub/hosc07.pdf

> I'd be extremely interested in hearing more about this at a future
> PDXFunc!

Yeah this is a really interesting paper and has a _lot_ in common with
what I'm talking about. I don't quite have my head wrapped around the
lens part of it though. Their idea of dealing with the edits instead
of the data sounds pretty similar to the second transformation I
mentioned that transforms edit commands from the projection back
through the system until they're consumed, but the difference is they
are thinking of the projection as a series of edits as well which is
very interesting. With my limited understand of the paper so far I
think the types of "edits" that would be required in a projection are
going to be quite different than the set of edits that could be
initiated by a user. For example in the projection you might project
some "sugar" like brackets and parens, and in this system those would
show up as insert edits going to the output, but the user shouldn't be
able to do those insert edits, and they shouldn't be able to delete
them either. There would however be a set of edits that would be
common to the projection and the set of user edits so it could still
be interesting. It's quite different than how I was thinking about it.

Reply all
Reply to author
Forward
0 new messages