Sacred Cows

148 views
Skip to first unread message

Josh Marinacci

unread,
May 1, 2012, 8:25:04 PM5/1/12
to pdx...@googlegroups.com
Hi guys. I'm the author of the Sacred Cows posts that were recently discussed at your meetings. Jake pointed me to your mailing list. I just read through the threads and realized I can't repsond to everything but there is so much good stuff here, so I decided to start a new thread.

My goal was the essays is ultimately not to provide a solution but to provoke discussion.  I feel like programming languages have stagnated. We keep polishing our pointy corners until they are perfectly round rather than going off to explore new worlds. Reading through the threads here I'm happy to see people engaged in actual discussion of the merits of my ideas, rather than just saying "it won't work".

One of the people on the other thread mentioned that many of these ideas have been tried before.  This is very true. I've been recently reading through all of the papers I can find in structured and visual programming and finding interesting stuff.  However, the end result is rather depressing. Virtually everything in today's "new languages" is at least 30 years old.  It feels like we continue to recycle the past.  My essays are an attempt to spark discussion on unexplored possibilities but vast majority of the reaction has been to essentially say "don't fix what ain't broke". (I could diverge here into a larger essay on how the western world can no longer dream and do big things, but that might be a bit too off topic).

Regarding my specific suggestion that we don't store code as plain text, my point is not to advocate a database or some other technology in particular. It is to highlight how stagnant we have become.  Just during my professional career (and I graduated in 97) we've had about 8 cycles of Moore's law, and yet I still see programmers arguing about whether the overhead of garbage collection is worth it.  I have a *phone* with a 1.4GHz processor and a gig of RAM!!  The world has changed drastically but the way we right programs hasn't. In fact we fall further and further behind each year.

So..., from the postulate that surely we won't store our code as plain text 100 years from now, let's go exploring.  If we are thinking of our code as a graph rather than a plain text syntax, what does this open up? Inline resources? Non-ascii characters as part of the language? Draw part of your code instead of writing it?  I feel like there's a lot of stuff in here waiting to be explored.

Thanks,
 Josh


BTW: I live in Eugene. I'd love to come up for a pdxfunc meeting sometime.
BTW: is the Zissou society in reference to the Life Aquatic?

Matt Youell

unread,
May 1, 2012, 10:52:10 PM5/1/12
to pdx...@googlegroups.com
On 5/1/12 5:25 PM, Josh Marinacci wrote:
> BTW: is the Zissou society in reference to the Life Aquatic? --
Allegedly. :)


--
-/matt/-
http://youell.com/matt


Jake Brownson

unread,
May 2, 2012, 12:18:14 AM5/2/12
to pdx...@googlegroups.com
Josh, we'd love to have you at our meetings. I think you'd enjoy it too. Everyone there is smarter than me, it's great.

I'm not sure exactly what started it, but there seems to be a lot of discussion happening all around in this area recently. Jonathan Edwards (no not that one) has done some great work in this area with Subtext and there's a pretty interesting discussion brewing on his latest blog post: http://alarmingdevelopment.org/?p=680

markus

unread,
May 2, 2012, 2:03:15 PM5/2/12
to pdx...@googlegroups.com
Hi Josh!

> My goal was the essays is ultimately not to provide a solution but to
> provoke discussion.

You succeeded. :)

I feel I was one of the two most vocal critics of your position (or,
better, make finer distinctions and claim that I was the most vocal
critic of your position and Bart was the most articulate) so I'd like to
start by delineating where I agree and where I disagree.

> I feel like programming languages have stagnated. We keep polishing
> our pointy corners until they are perfectly round rather than going
> off to explore new worlds.

100% agree.

> Reading through the threads here I'm happy to see people engaged in
> actual discussion of the merits of my ideas, rather than just saying
> "it won't work".

100% agree.
>
> One of the people on the other thread mentioned that many of these
> ideas have been tried before. This is very true. I've been recently
> reading through all of the papers I can find in structured and visual
> programming and finding interesting stuff. However, the end result is
> rather depressing. Virtually everything in today's "new languages" is
> at least 30 years old. It feels like we continue to recycle the past.

95% agree.

> My essays are an attempt to spark discussion on unexplored
> possibilities but vast majority of the reaction has been to
> essentially say "don't fix what ain't broke"

That would characterize my position as well. Or, to borrow a phase from
you a few paragraphs above, I would say that if your concern are as you
state them, we oughtn't waste effort "polishing our pointy corners until
they are perfectly round." Textual storage of source code is a mostly
rounded corner.

It is, however, a very attractive trap for would be language designers
-- so much so that it's the very first item on the checklist
http://colinm.org/language_checklist.html :

You appear to believe that:
[ ] Syntax is what makes programming difficult


> . (I could diverge here into a larger essay on how the western world
> can no longer dream and do big things, but that might be a bit too off
> topic).

Agree, except that I'm not sure it would be off topic.

> Regarding my specific suggestion that we don't store code as plain
> text, my point is not to advocate a database or some other technology
> in particular.

Understood.

> It is to highlight how stagnant we have become. Just during my
> professional career (and I graduated in 97) we've had about 8 cycles
> of Moore's law, and yet I still see programmers arguing about whether
> the overhead of garbage collection is worth it. I have a *phone* with
> a 1.4GHz processor and a gig of RAM!! The world has changed
> drastically but the way we right programs hasn't. In fact we fall
> further and further behind each year.

Agree, but with a significantly longer baseline and a slightly different
take on the social and technological mechanisms.

> So..., from the postulate that surely we won't store our code as plain
> text 100 years from now,

I strongly doubt the truth of this postulate.

> let's go exploring. If we are thinking of our code as a graph rather
> than a plain text syntax, what does this open up?

Worms. Cans and cans of them. :)

More specifically, language is the serialization of annotated graphs
(e.g. Chomsky). That's the whole point of having language in the first
place. When we "think of code" we of course think of it as a richly
annotated graph, but when we communicate/store it we must use a language
to serialize it. And from your previous statements I'll take the
liberty of inferring that this mandatory serialization is really what
you object to, so you are asking, in effect, either

"What would happen if our canonical serialization wasn't serial?"

or

"What would it be like if we could write programs that weren't
expressible in any language?"

> Inline resources? Non-ascii characters as part of the language? Draw
> part of your code instead of writing it? I feel like there's a lot of
> stuff in here waiting to be explored.

And, as Bart pointed out and you are discovering, much of it has been
explored. Repeatedly. That's no reason not to explore it again, but it
may be worthwhile to look into why / how previous efforts floundered
too.


I'd like to offer a counter proposal: if your concerns are as you state
them, look at the areas where programming has ratcheted forward. In
almost every case, the pattern has been remarkably similar:

* Existing programs either don't deal with some problem space X,
or they do so in an ad hoc manner and an inordinate amount of
the program's complexity is tied up in dealing with it (though
this is almost never evident except in hindsight).
* Some ways of dealing with X are better and eventually become
enshrined in common practice
* Some of the adhoc code to do X "as you do" gets extracted into
libraries, and more programs that "didn't need X" discover that
they do.
* Some languages start to provide first class support for X
(either directly or as ubiquitous, low-friction libraries) and
suddenly languages that don't support X seem fusty and hard to
work with.

So examples of this, in very rough chronological order: integers, sub
routines, arrays, source code, structs, variables, looping structures,
recursion, floats, characters, fixed size strings, I/O, files,
indexing/dbs, booleans, quasi-variable length strings, namespaces,
garbage collection, objects, regular expressions, hash dictionaries,
networking, etc.
.
Some things that seem to keep getting reinvented but not sticking in any
widely adopted final form: trees, graphs, sets, pure functions, fuzzy
values/confidence intervals, units, exception handling, parallelism,
first class serialization, enumerations, baked in security, validity
testing/proving, rationals, complex numbers, vectors, structural pattern
matching, speculative execution, separation of concerns, etc. These for
the most part seem to be mired in the first half of the process, though
there have been languages that gave first class status to most of them.

I would posit that the reason things take 30 years to make it into
languages is that these steps take a lot of brain power and most people
who are in a position to address them are more concerned with other
goals. If you want to move the state of the art forward, I'd suggest a
more productive approach would be to pay attention to the unwanted
complexity on current code written in modern languages and try to
abstract some facility that would turn that into a solved problem.

Going from:

int found = 0
for (i = 0; i < length(s); i++) {
if (s[i] == 'a') {
int j = i+1
while (s[j] >= '0' && s[j] <= '9') { j++ }
if (s[j] == 'b') { found = 1; break }
}
}
if (found) { ...

to

if s =~ /a(\d*)b/ ...

was a major step forward.

If you can find and abstract even one similar pattern you'll advance the
art more (IMHO) than anything you'll get by mucking about with the
serialization syntax.

> BTW: I live in Eugene. I'd love to come up for a pdxfunc meeting
> sometime.

I get down to Eugene occasionally, and would love to get together for
lunch some time.

-- Markus



Jake Brownson

unread,
May 2, 2012, 3:58:00 PM5/2/12
to pdx...@googlegroups.com
> It is, however, a very attractive trap for would be language designers
> -- so much so that it's the very first item on the checklist
> http://colinm.org/language_checklist.html :
>
>        You appear to believe that:
>        [ ] Syntax is what makes programming difficult

I think it's really interesting that you bring this up to presumably
poke a bit of fun at the idea, but then later in the email give an
example in which a new syntax radically improved a chunk of code. You
could argue that it is more than syntax and I might not disagree, but
new syntax was required to solve that problem in that way.

That said I think even if we used projections that looked exactly like
the text, but were in fact structured we would see huge benefits. I
base this on my experience using MPS which does its best to project
things that look like text. I also think if we move past this
restriction there are even more benefits to be gained which I base on
my experience at ISC and on work like this:
http://sites.google.com/site/larchenv/ and Jonathan Edwards' Subtext
demos.

> More specifically, language is the serialization of annotated graphs
> (e.g. Chomsky).  That's the whole point of having language in the first
> place.  When we "think of code" we of course think of it as a richly
> annotated graph, but when we communicate/store it we must use a language
> to serialize it.  And from your previous statements I'll take the
> liberty of inferring that this mandatory serialization is really what
> you object to, so you are asking, in effect, either
>
>    "What would happen if our canonical serialization wasn't serial?"
>
> or
>
>        "What would it be like if we could write programs that weren't
>        expressible in any language?"
>

This is a really interesting paragraph. I should probably study
Chomsky more and hadn't heard this idea before. If you have any more
gems like this, or pointers to resources with them I'd love to hear
them.

I think we need to put the point that we go from a representation of
the code to a communication of the code as far out as possible. Right
now we store the communication of the code on disk just as we display
it on screen. Let's store the code as a graph on disk (a serialized
communication of the graph is conceptually different than a direct
serialized communication of the program itself) and convert it to a
serialized communication just before we turn it into pixels. This way
we can have different communications/projections of the program and
our edits can more directly effect the program rather than modifying
the communication of it.

> And, as Bart pointed out and you are discovering, much of it has been
> explored.  Repeatedly.  That's no reason not to explore it again, but it
> may be worthwhile to look into why / how previous efforts floundered
> too.

Agreed, but there are some examples of things that have found/are
finding success in this area that I'd encourage detractors to explore.
The most notable and accessible being MPS.

> So examples of this, in very rough chronological order: integers, sub
> routines, arrays, source code, structs, variables, looping structures,
> recursion, floats, characters, fixed size strings, I/O, files,
> indexing/dbs, booleans, quasi-variable length strings, namespaces,
> garbage collection, objects, regular expressions, hash dictionaries,
> networking, etc.

It's really interesting that none of the things you've mentioned are
directly related to the experience of entering programs (though it's
reasonable argument to say the UI a programmer uses includes the
semantics of the language). Why not include find/replace, IDEs, code
completion, syntax highlighting, source control, etc. This is the
category of stuff I think the things we're talking about fit. I'm not
really interested in language design, I'm interested in how we use our
languages. Initially it'd be fine w/ me if we use the same languages,
but I think such a tool will enable language designers to accelerate
their work and open new possibilities that text makes difficult.

> Some things that seem to keep getting reinvented but not sticking in any
> widely adopted final form: trees, graphs, sets, pure functions, fuzzy
> values/confidence intervals, units, exception handling, parallelism,
> first class serialization, enumerations, baked in security, validity
> testing/proving, rationals, complex numbers, vectors, structural pattern
> matching, speculative execution, separation of concerns, etc.  These for
> the most part seem to be mired in the first half of the process, though
> there have been languages that gave first class status to most of them.

I realized something that I maybe didn't make clear in my earlier
emails. I'm not trying to make a graph to be applied whenever someone
is doing a graph algorithm. This is why I don't need to worry about
subgraphs, etc to make it complete. One could define a domain to talk
about one's specific version of a "graph" and then store it in the
underlying data model I'm talking about. The domain could talk about
subgraphs, etc, but the underlying data model doesn't need to have
those things to be able to represent them at the higher metalevel. In
other words I'm not trying to define the end all be all graph library
for implementing all graph algorithms, but a data model that has a
graph structure.

> If you can find and abstract even one similar pattern you'll advance the
> art more (IMHO) than anything you'll get by mucking about with the
> serialization syntax.

If we can solve the structured editor/language workbench problems
experimenting with and deploying these abstractions becomes way
easier, and will likely accelerate that process which I agree is where
a huge amount of value is.

I'd be interested to hear your criticisms of MPS. An existence proof
is usually a pretty strong one. (I certainly have my own criticisms of
it, but as I've stated before they're boring software engineering
ones, not critical ones of the underlying model)

Bart Massey

unread,
May 2, 2012, 5:09:43 PM5/2/12
to pdx...@googlegroups.com
As usual, I'd concur with Markus on most of this. (I have to say, though, that that PL Checklist is kind of crap IMHO. It's a mixture of things that I'd vehemently agree with [syntax doesn't matter] and ones I'd vehemently disagree with [GC is not free]. The only thing worse than anti-advice is a mixture of advice and anti-advice.)

I guess what I would add is that the PL situation mostly hasn't improved over the past 30 years by adding clever new ideas, paradigms and representations. It has mostly improved by offering a larger range of engineering tradeoffs that are easier to take advantage of. I could choose to program in Nickle or Python, Perl or AWK or sed, SML or OCAML or Haskell, C or C++ or Objective C, Java or C#, etc. Each of these languages has viable use cases and is reasonable to program in. Each has reasonably high-quality implementations and support available free of charge and open source for the most popular platforms. Once you know a few of these, learning the rest of them is not a huge deal. I've written code in many of them in the past year and not regretted it. This is as dramatically different from the early 1980s as one can imagine; arguably it's been the single biggest contributor to developer productivity.

This is why the "we have too many languages, don't invent new ones" crowd drives me nuts. I'm perfectly happy to see Coq and Agda and Guru coexist and tackle the problems of their target domains in only slightly different ways. As we have built Nickle, we have made some deliberate choices of things to include and not to include from the menu of PL options. One of our rules has been to avoid inventing anything whenever feasible within our design constraints. We've had to invent a few things to make Nickle nice, but there are large areas where we just haven't gone because we have no models in the existing language space and I can't see how to advance the state of the art.

To me, the biggest problem in PL development today is that three communities that should be at the heart of PL design--the academic PL community, the academic SE community and the extra-academic developer base--not only don't talk to each other anymore but barely can talk to each other. I would actually claim that PL design isn't "stuck" at all--it's just that the academic PL folks who continue to advance it are so poor at communicating what they're doing and why they're doing it, and so poor at producing things that the other two communities can use, that they almost might as well not bother. Coq and Agda and Guru are actually pretty cool, and solve some real problems that you care about. Too bad "not just any Ph.D. in Programming Languages" [to borrow my friend Jim Larsen's phrase) can figure out how to use them.

I think that the biggest advance of the last 20 years has been sound and flexible polymorphic typing (parametric or template polymorphism). I say 20 years as a compromise between 30 years ago, when most of the work was done in academia, and 10 years ago, when you started seeing the results of this work in mainstream PLs. Right now "correct-by-construction" is the academic big thing; I expect to see real-world languages that support this concept reasonably about 2030 if I live that long.

To close where Markus opened: Underlying the idea that PLs need to be improved is the assumption that much of the difficulty and complexity of programming today is still what Brooks famously called "accidental": due to poor tools or processes rather than inherent in the problem to be solved. I don't know how true I believe this to be. More and more as I get older, I find that my bugs are from not understanding what I was trying to do, and my low productivity is because I'm solving hard problems. YMMV.

--Bart

Cosmin Stejerean

unread,
May 2, 2012, 10:14:03 PM5/2/12
to pdx...@googlegroups.com
On Wed, May 2, 2012 at 12:58 PM, Jake Brownson <ja...@brainiumstudios.com> wrote:
> Right now we store the communication of the code on disk just as we display it on screen.

That's not entirely true. We typically display code using syntax
highlighting. In some editors code might be displayed in sections that
can be collapsed or expanded. Projects like Light Table are also
trying to explore more useful ways to represent code on the screen.
Serializing code into text files has the major advantages of working
with existing tools, but we certainly don't need to limit ourselves to
displaying things on the screen as flat, boring files.

--
Cosmin Stejerean
http://offbytwo.com

Josh Marinacci

unread,
May 7, 2012, 12:32:22 PM5/7/12
to pdx...@googlegroups.com
I do think that 100 years from now we probably will still program largely using language and symbols, much as we do today. Language is hard wired into the human brain, so I don't' think we will stray to far from it.  However, I do think what we see on screen and what is stored on disk should be split.  I also think that how we store our code on disk is an irrelevant implementation detail.  At first glance these would appear to be contradictory statements, but I don't think so.  

A lot of design in new programming languages revolves around what is stored on disk.  For example, the new Go programming language, which is arguably the nicest widely used general purpose systems language to come along in a while, has specific rules about whitespace and semicolons.   To me, this is a symptom of the problem that we are conflating storage and machine interface with the human interface, and this causes us to solve the wrong problems. Let's explore these two examples for a second. I think they are a useful self-contained exercise.

Semicolon restrictions: The semicolon issue is about making it possible for the compiler to parse your code with no ambiguity.  If the computer is supposed to serve the human, then why should I adjust how I write my code to the limits of the compiler?  Either the compiler needs a better parser, or else we need to move dis-abiguation from the place where the human isn't: inside the parser,  to the place where the human is: the code editor. A good IDE will know what I meant. And if it doesn't then it can ask me right then. The disambiguation happens once, right where the human is, and then never becomes an issue again.  This fails of course if you edit the code outside an IDE, much like you could easily break a PNG by editing it outside a graphic editor.  This is why storing code as plain text is an issue.  Restricting the way the language deals with semicolons is solving the wrong problem. It forces more rules upon the human just to make life easier for the machine.

Whitespace:  The Go language has specific rules for how you use whitespace in your program. This makes for cleaner code since you can remove boilerplate syntax like curly braces. It also means code written by one person can easily be read by another.  But again, we are solving the wrong problems.  The problems are: displaying code in a clear and concise manner on screen, and letting programmers communicate unambiguously across time and space.   Modifying the language syntax feels like the wrong solution.  Instead we should be relying on the computer to do this.  An IDE could display the code in the manner most pleasing to the human who is reading it, even if the person who wrote it uses different settings.  Removing the curly braces is again making the human help out the parser.  If we don't store code as plain text then it's a non issue.  I can type in braces if it helps me to think. Another programmer can use indentation to indicate where a block begins and ends.  The compiler never has to care because it's not plain text by the time it gets to the compiler.  The other programmer doesn't have to care what whitespace form I use because he never sees it. He sees what is appropriate for him.

Of course, lots of new complexity is introduced by going to non-plain text. All of our tools are built around it.  But I think these are solvable if the benefits are worth it.  Plus, once we jump in to the deep end of the non-ascii pool lots of other things become possible, or even trivial: more use of math symbols, visual editors for particular data types,  and my personal favorite: multiline string literals.

- Josh


-- 
Josh Marinacci

--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pdxfunc/-/bg91w0mpOBYJ.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.

markus

unread,
May 7, 2012, 1:19:44 PM5/7/12
to pdx...@googlegroups.com

> Plus, once we jump in to the deep end of the non-ascii pool lots of
> other things become possible, or even trivial [...] and my personal
> favorite: multiline string literals.

I believe with a little research we could come up with some way of
storing multiline strings in text files. :)

-- Markus


Cosmin Stejerean

unread,
May 7, 2012, 1:33:36 PM5/7/12
to pdx...@googlegroups.com, pdx...@googlegroups.com
Given that the benefits that have been discussed tend to be things that can be accomplished by IDEs today (if only someone cared enough to do the work), and still serialize to the same grammar in plaintext files, I don't understand the need for some opaque and poorly supported serialization format.


Cosmin Stejerean

Jesse Hallett

unread,
May 7, 2012, 4:03:03 PM5/7/12
to pdx...@googlegroups.com
It sounds to me like the ideas that you describe could be implemented
by storing code as an AST and transforming it to a more friendly
grammar for viewing and editing. Perhaps each programmer could choose
and customize a different frontend syntax to suit their particular
preferences. This idea makes me think of the proposed M-expressions
for LISP. Maybe someone has tried something similar at some point in
LISP history.

Patrick Logan

unread,
May 7, 2012, 6:11:07 PM5/7/12
to pdx...@googlegroups.com
>
> It sounds to me like the ideas that you describe could be implemented
> by storing code as an AST and transforming it to a more friendly
> grammar for viewing and editing.  Perhaps each programmer could choose
> and customize a different frontend syntax to suit their particular
> preferences.  This idea makes me think of the proposed M-expressions
> for LISP.  Maybe someone has tried something similar at some point in
> LISP history.

There have been several "alternate syntax" for various lisps over the years.

One current example is Gambit Scheme's "SIX" syntax, which stands for
"Scheme Infix syntaX".

I've never quite been able to get my head around the "Intentional
Programming" ideas enough, though, to figure out what's different
between alternate syntax like this and the graph/tree model of
Intentional Programming.

-Patrick

Bart Massey

unread,
May 8, 2012, 2:17:44 AM5/8/12
to pdx...@googlegroups.com
I think talking about semicolons and the offside rule is a great way to get at why I'm skeptical of this whole plan of improving syntax of representations of programs.

Speaking just for myself, as a guy who writes a reasonable amount of code in a console text editor each week in a large variety of different languages, the amount of time and trouble I experience with these kinds of syntax nits is nigh zero. My bugs are almost all semantic. Of these semantic bugs, many are caught by static checking---the more static checking my language implements, the more likely my code is to work properly the first time. The bugs that get past the typechecker and such mostly appear prominently at runtime and are easily dispatched. The remaining bugs are horrible, and cost me maybe 50% of my implementation time.

The other 50% of my implementation time is...just implementation. Arguably, much more expressive languages (e.g. Haskell) reduce this time substantially relative to much less expressive languages (e.g. C). But it never goes to zero, and there's a certain amount of conservation of programming difficulty: I spend way more time thinking than typing, and the thinking is about what algorithms and data structures are representationally best. Again, higher-level languages may admit implementation of better algorithms and use of better data structures, but it's not a world of difference.

At the end of the day, for the kinds of things I write, probably only 40% of my time is spent implementing anyway. The other 60% is spent on research, figuring out what to build, documenting, testing, deploying for use. Most of these steps are affected only marginally by my choice of language.

The recurring theme here is that Amdahl's Law starts to eat you alive. Even if your programming language or software engineering improvement is really epic---HLLs and structured programming in the 1970's, automated memory management and decent static type systems in the 1980's, pick your favorite for the 1990's and beyond---it can really only knock down the overall effort of development by a factor of 2 or so, and these effects don't tend to be too cumulative.

What are the possible epic improvements pending for the new millennium? I can think of three areas. First, the attempt to permit very smart programmers to write code in such a way that it needs very little testing or debugging through the intense application of constructive formal methods: think Coq and Agda. Second, the increasing quality of "AI" methods that remove most of the burden of figuring out what program to write from the programmer: think state space search and machine learning. Finally, the attempt to develop ways of specifying domain-specific computation that are accessible to non-programmers or semi-programmers: spreadsheets and databases are the early examples, but I think there are no obvious and well-known examples in the current generation.

Assuming all of these efforts are astounding successes, I think the problem of producing software will still be with us for a long time.

"Be comforted that in the face of all aridity and disillusionment,
 and despite the changing fortunes of time,
 There is always a big future in computer maintenance."
 --National Lampoon, Deteriorata

Josh Marinacci

unread,
May 8, 2012, 9:48:49 AM5/8/12
to pdx...@googlegroups.com
It is certainly possible to keep adding graph-based features onto our existing system, but it's just hacks upon hacks. Eventually it will be harder and harder to add new things and the system will collapse upon itself. Imagine if we only stored graphics as JSON files. It would technically work and might be convenient for certain use cases, but it won't hold up long term.

-- 
Josh Marinacci

Josh Marinacci

unread,
May 8, 2012, 9:53:43 AM5/8/12
to pdx...@googlegroups.com
I don't think formatting and syntax changes will make us productive. That's why it bothers me that a new language like Go takes it into consideration. My hope is that by going to serialized graphs we remove semicolons, whitespace, bracing, etc. from the discussion entirely so that we can concentrate on the things that do make a difference.

Could you go into more detail about the AI methods you mention? That sounds interesting.
- josh

-- 
Josh Marinacci

--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pdxfunc/-/JaTZSRbFaN0J.

Patrick Logan

unread,
May 8, 2012, 10:06:26 AM5/8/12
to pdx...@googlegroups.com

I am too busy to ask anyone else to give up time for my wishes, but I would love to see a low fidelity story board of how you envision a programmer writing a program in this new system. I am trying not to get hung up on the discussion points about serializing things to disk... I believe you are trying to convey a different way for programmers to construct programs, and I want to see that part of the story. Disks be damned...

Josh Marinacci

unread,
May 8, 2012, 10:39:19 AM5/8/12
to pdx...@googlegroups.com
I don't know if I'm to that point yet. This is all still percolating in my brain.  This is the ultimate question: if we were unconcerned with parsing issues (whitespace, semicolons, etc), what could we do?  That's the real discussion.  Once we have an idea of where we want to be then we can find a way to get there.


Here's a few brainstorms:

* Use more math-like notation for graphics programming.  If an IDE can auto-complete a method then why can't it auto complete a math symbol like theta and alpha.  It would be nice to see some real math symbols in the middle of my normal looking code, if it would help me understand the algorithm better. For example, Porter Duff compositing can be expressed much more clearly and concisely when written using mathematical notation.

* Any graphical resources should be visible inline and I shouldn't worry about their storage and details of loading them.  Essentially they are part of the code and are managed at compile time.  This could also do cool things like pre-scale images for different screen resolutions.

* Unit tests shouldn't be in a separate package or file. They are just extra code which call the method I'm working on with different arguments. Could a unit test be expressed as a table of test data floating next to the current method, rather than the typical infrastructure of a JUnit test.  As I code it could tell me if any of those tests would fail by turning the appropriate data red.

* Regex. there has *got* to be a better way to visualize these. I'm not sure what it is yet, but this seems ripe for disruption.

* web app security.  The inside of a web service feels symbolic. Some data comes in, I do calculations, I return data. It's a fancy subroutine. Expressing it using traditional code makes sense.  The outside of web services do not feel symbolic. Maybe that's why JavaEE moved towards assembling the with non-code tools like XML.  When I handle security I hate dealing with half symbolic / half declarative XML structure that is extremely verbose.  When I think about the security of my web app I imagine it as concentric circles.  Requests come from outside. They reaches a web service. If that web service is public then it is outside the circle.  If it requires some authentication then it is inside the circle. The request must go through the circle boundary (some authentication mechanism) to reach the service within.  If there are multiple levels of authentication (none, user, admin) then there are concentric circles with extra auth required for the inner ones.  This feels like the natural way to visualize the problem, yet in practice I must mentally convert it into an XML blob.  I would prefer to visually handle the web security. Anything outside the circle is public. Anything inside is protected.  

* Once the previous item is done, we have a natural place to put performance monitoring: in the interface where we visualize how the services are organized.  I could see a lines of the requests coming in with darker/redder colors as the frequency increases. Colorize each service as it is accessed. This gives me a heat map of my running application so I can easily see where the hotspots are. Is it my database or the caches that are getting slammed? Combined with a nice deployment system I could even manipulate my deployed app in realtime. Moving a process from one server to another for example.  Once we decide to address a particular problem visually lots of interesting things open up.




- josh

-- 
Josh Marinacci

Dan Colish

unread,
May 8, 2012, 11:12:54 AM5/8/12
to pdx...@googlegroups.com
Does syntactic visualization really help that much? If you compress data so it can be represented visually you are compromising the expressiveness of the data for the conciseness of the visual language. I have doubts that it is possible to develop a generalized language which does this well. Really it seems that the idea of sacred cows and throwing them away is a little bogus. I do not see any way that changing the representation of the problem will change the actual problem itself. This is no magic bullet. I think the improvements we will see in coding will not come from tooling but better practices. For example, instead of asking for a better way to manage overly complicated web app security, look for ways to simplify that problem or break it into pieces that are understandable. Bart's right, you really don't spent a lot of time dealing with implementation issues and those implementation issues which you do run into, even in the most difficult languages, are solvable by a clear pat
h. I think we're talking about the wrong cows.

-- Dan


On Tuesday, May 8, 2012 at 7:39 AM, Josh Marinacci wrote:

> I don't know if I'm to that point yet. This is all still percolating in my brain. This is the ultimate question: if we were unconcerned with parsing issues (whitespace, semicolons, etc), what could we do? That's the real discussion. Once we have an idea of where we want to be then we can find a way to get there.
>
>
> Here's a few brainstorms:
>
> * Use more math-like notation for graphics programming. If an IDE can auto-complete a method then why can't it auto complete a math symbol like theta and alpha. It would be nice to see some real math symbols in the middle of my normal looking code, if it would help me understand the algorithm better. For example, Porter Duff compositing can be expressed much more clearly and concisely when written using mathematical notation.
>
> * Any graphical resources should be visible inline and I shouldn't worry about their storage and details of loading them. Essentially they are part of the code and are managed at compile time. This could also do cool things like pre-scale images for different screen resolutions.
>
> * Unit tests shouldn't be in a separate package or file. They are just extra code which call the method I'm working on with different arguments. Could a unit test be expressed as a table of test data floating next to the current method, rather than the typical infrastructure of a JUnit test. As I code it could tell me if any of those tests would fail by turning the appropriate data red.
>
> * Regex. there has *got* to be a better way to visualize these. I'm not sure what it is yet, but this seems ripe for disruption.
>
> * web app security. The inside of a web service feels symbolic. Some data comes in, I do calculations, I return data. It's a fancy subroutine. Expressing it using traditional code makes sense. The outside of web services do not feel symbolic. Maybe that's why JavaEE moved towards assembling the with non-code tools like XML. When I handle security I hate dealing with half symbolic / half declarative XML structure that is extremely verbose. When I think about the security of my web app I imagine it as concentric circles. Requests come from outside. They reaches a web service. If that web service is public then it is outside the circle. If it requires some authentication then it is inside the circle. The request must go through the circle boundary (some authentication mechanism) to reach the service within. If there are multiple levels of authentication (none, user, admin) then there are concentric circles with extra auth required for the inner ones. This feels like the natural way to
visualize the problem, yet in practice I must mentally convert it into an XML blob. I would prefer to visually handle the web security. Anything outside the circle is public. Anything inside is protected.
>
> * Once the previous item is done, we have a natural place to put performance monitoring: in the interface where we visualize how the services are organized. I could see a lines of the requests coming in with darker/redder colors as the frequency increases. Colorize each service as it is accessed. This gives me a heat map of my running application so I can easily see where the hotspots are. Is it my database or the caches that are getting slammed? Combined with a nice deployment system I could even manipulate my deployed app in realtime. Moving a process from one server to another for example. Once we decide to address a particular problem visually lots of interesting things open up.
>
>
>
>
> - josh
>
> --
> Josh Marinacci
> joshondesign.com (http://joshondesign.com)
>
>
> On Tuesday, May 8, 2012 at 7:06 AM, Patrick Logan wrote:
>
> > I am too busy to ask anyone else to give up time for my wishes, but I would love to see a low fidelity story board of how you envision a programmer writing a program in this new system. I am trying not to get hung up on the discussion points about serializing things to disk... I believe you are trying to convey a different way for programmers to construct programs, and I want to see that part of the story. Disks be damned...
> > On May 8, 2012 6:53 AM, "Josh Marinacci" <joshma...@gmail.com (mailto:joshma...@gmail.com)> wrote:
> > > I don't think formatting and syntax changes will make us productive. That's why it bothers me that a new language like Go takes it into consideration. My hope is that by going to serialized graphs we remove semicolons, whitespace, bracing, etc. from the discussion entirely so that we can concentrate on the things that do make a difference.
> > >
> > > Could you go into more detail about the AI methods you mention? That sounds interesting.
> > > - josh
> > >
> > >
> > > --
> > > Josh Marinacci
> > > joshondesign.com (http://joshondesign.com)
> > >
> > >
> > > On Monday, May 7, 2012 at 11:17 PM, Bart Massey wrote:
> > >
> > > > I think talking about semicolons and the offside rule is a great way to get at why I'm skeptical of this whole plan of improving syntax of representations of programs.
> > > >
> > > > Speaking just for myself, as a guy who writes a reasonable amount of code in a console text editor each week in a large variety of different languages, the amount of time and trouble I experience with these kinds of syntax nits is nigh zero. My bugs are almost all semantic. Of these semantic bugs, many are caught by static checking---the more static checking my language implements, the more likely my code is to work properly the first time. The bugs that get past the typechecker and such mostly appear prominently at runtime and are easily dispatched. The remaining bugs are horrible, and cost me maybe 50% of my implementation time.
> > > >
> > > > The other 50% of my implementation time is...just implementation. Arguably, much more expressive languages (e.g. Haskell) reduce this time substantially relative to much less expressive languages (e.g. C). But it never goes to zero, and there's a certain amount of conservation of programming difficulty: I spend way more time thinking than typing, and the thinking is about what algorithms and data structures are representationally best. Again, higher-level languages may admit implementation of better algorithms and use of better data structures, but it's not a world of difference.
> > > >
> > > > At the end of the day, for the kinds of things I write, probably only 40% of my time is spent implementing anyway. The other 60% is spent on research, figuring out what to build, documenting, testing, deploying for use. Most of these steps are affected only marginally by my choice of language.
> > > >
> > > > The recurring theme here is that Amdahl's Law starts to eat you alive. Even if your programming language or software engineering improvement is really epic---HLLs and structured programming in the 1970's, automated memory management and decent static type systems in the 1980's, pick your favorite for the 1990's and beyond---it can really only knock down the overall effort of development by a factor of 2 or so, and these effects don't tend to be too cumulative.
> > > >
> > > > What are the possible epic improvements pending for the new millennium? I can think of three areas. First, the attempt to permit very smart programmers to write code in such a way that it needs very little testing or debugging through the intense application of constructive formal methods: think Coq and Agda. Second, the increasing quality of "AI" methods that remove most of the burden of figuring out what program to write from the programmer: think state space search and machine learning. Finally, the attempt to develop ways of specifying domain-specific computation that are accessible to non-programmers or semi-programmers: spreadsheets and databases are the early examples, but I think there are no obvious and well-known examples in the current generation.
> > > >
> > > > Assuming all of these efforts are astounding successes, I think the problem of producing software will still be with us for a long time.
> > > >
> > > > "Be comforted that in the face of all aridity and disillusionment,
> > > > and despite the changing fortunes of time,
> > > > There is always a big future in computer maintenance."
> > > > --National Lampoon, Deteriorata
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> > > > To view this discussion on the web visit https://groups.google.com/d/msg/pdxfunc/-/JaTZSRbFaN0J.
> > > > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > > > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).
> > > > For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> > > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc%2Bunsu...@googlegroups.com).
> > > For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).
> > For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).

Jake Brownson

unread,
May 8, 2012, 11:55:12 AM5/8/12
to pdx...@googlegroups.com
I'm sure Bart and probably most folks in this conversation have gotten
so good at using text editors and the languages and tools that go with
them that it's become a negligible part of their work day. I don't
think that is evidence that there isn't a problem. If we had enough
time to practice we could probably be pretty productive writing code
with an abacus.

3. Though I'm well practiced myself at editing text I very much enjoy
the process of working in a projectional editor like MPS. Whenever I
start typing in the boiler plate for a C++ class without it I just
shake my head and wonder why the heck I'm doing this, where editing a
class in MPS I have to do just the right amount of things to express
what I want to express. It certainly doesn't make me write twice as
much code in the same amount of time (though I do think there's a
speed up in raw entry time and I don't dispute the Amdahl's law
argument for someone who's been programming in text for a decade or
more), but it makes the process much more pleasurable. It also opens
up projections that are especially difficult to achieve in text that
can better represent the abstract ideas one is manipulating. See
Subtext 2 for a great example of this
http://subtextual.org/subtext2.html.

2. Code written in a projectional editor is going to be much easier
and more pleasurable to be read. As Josh has rightly pointed out the
presentation of the data (i.e. code) is separate from the data itself.
A practice we would all follow when designing for our "end users" but
don't follow for ourselves. We can all see the code in the way that
best suits our tastes and our objective at any given time. Sometimes I
want to see different projections of the same code. This is difficult
to achieve with text, but trivial with a proper projectional editor.
Ex: When I first look at a class I want to see the state and the
methods. In C++ I look at the header. Sometimes the header has a bunch
of things defined inline, sometimes it doesn't. Even if it doesn't the
programmer had to maintain the header separately from the .cpp.
Remember that a projection doesn't need to project all of the data,
nor does every projection need to be completely editable, or editable
at all. You could have a projection that shows and even runs unit
tests next to methods, and one that is specifically designed for
debugging that shows lines to references from the current line of
code. The possibilities are huge and underexplored.

1. Consider the benefits to someone that hasn't gotten used to all of
the design tradeoffs (and I mean this as a euphemism) we've made to
continue working with text. Someone presented with a blank text editor
has to do a lot of reading before she can know what to do. In MPS the
projection helps you understand what the degrees of freedom are, and
what choices you have to make. You don't need to develop all of the
little keyboard gymnastics we all have to wrap things in brackets,
keep the whitespace off the end of my lines (yes I'm a bit OCD), make
sure all the whitespace I want is where I want it, put semicolons at
the end of lines, spew out the boilerplate for a
class/method/whatever, keep your .h and your .cpp in sync, etc etc.
We've all mastered these things to a degree we barely even see them
anymore, but it's incredibly obvious to someone new (I bet most of you
were new to programming at one point). The less time a new programmer
has to spend mastering archaic tools and the more direction
experienced programmers can give through designing good projections
the better.

john melesky

unread,
May 8, 2012, 12:56:54 PM5/8/12
to pdx...@googlegroups.com
On Tue, May 08, 2012 at 06:48:49AM -0700, Josh Marinacci wrote:
> It is certainly possible to keep adding graph-based features onto
> our existing system, but it's just hacks upon hacks. Eventually it
> will be harder and harder to add new things and the system will
> collapse upon itself. Imagine if we only stored graphics as JSON
> files. It would technically work and might be convenient for certain
> use cases, but it won't hold up long term.

Let's assume that we'll need to serialize somehow. That seems a safe
assumption, since all files that aren't dumps of the heap are somehow
serialized (including png files, json files, and native photoshop
files).

To serialize a graph, you need, minimally, a node representation, and
a representation of connections between nodes.

There are a few ways to serialize those:
- All nodes, followed by all connections. This has a benefit in
verification -- you can read the node serialization, and then
validate that each connection points between valid nodes, all in
one pass. It's debatable whether single-pass is a worthwhile
concern with today's computing power, but it's there.
- All connections, followed by all nodes. This has the benefit of
being able to edit the connections without needing to read in the
nodes.
- Each node, accompanied by its connections. This has the benefit of
allowing per-node editing without needing to load in the rest of
the graph.

There are other serializations possible (e.g. random ordering of nodes
and connections), but those three seem the most broadly useful.

They also seem to be equally expressive. I may be mistaken, and i've
certainly not made a formal case for it, but intuition suggests
they're equivalent. If anyone can make a case otherwise (even
facetiously), i'm eager to hear it.

Let's clarify what the nodes and connections are. My understanding is
that a node represents some program logic, and a connection would be a
call out from that logic to a function (or method, or whatnot), or a
call into this logic from elsewhere.

Presumably we'd only want to serialize each connection once. If we're
using one of the first two serializations, there's no additional
thought needed. If we're using the third serialization (each node,
accompanied by connections), we need to make a decision. With each
node, do we store outbound connections (calls out), inbound
connections, or both (duplicating data)?

If we store outbound connections, then we've just described most
source serializations that are used nowadays: logic is stored in
functions, and outbound function calls are declared via import
statements of some variety.

There's some optimization that's been performed, of course. Groups of
functions that are likely to refer to eachother can be bundled
together in single files such that their connections can be easily
inferred. Aside from that, unless i'm missing something, that's the
sort of graph structure you're looking for.

-john



> On Monday, May 7, 2012 at 10:33 AM, Cosmin Stejerean wrote:
>
> > Given that the benefits that have been discussed tend to be things that can be accomplished by IDEs today (if only someone cared enough to do the work), and still serialize to the same grammar in plaintext files, I don't understand the need for some opaque and poorly supported serialization format.
> >
> >
> > Cosmin Stejerean
> >
> > On May 7, 2012, at 11:32, Josh Marinacci <joshma...@gmail.com (mailto:joshma...@gmail.com)> wrote:
> >
> > > I do think that 100 years from now we probably will still program largely using language and symbols, much as we do today. Language is hard wired into the human brain, so I don't' think we will stray to far from it. However, I do think what we see on screen and what is stored on disk should be split. I also think that how we store our code on disk is an irrelevant implementation detail. At first glance these would appear to be contradictory statements, but I don't think so.
> > >
> > > A lot of design in new programming languages revolves around what is stored on disk. For example, the new Go programming language, which is arguably the nicest widely used general purpose systems language to come along in a while, has specific rules about whitespace and semicolons. To me, this is a symptom of the problem that we are conflating storage and machine interface with the human interface, and this causes us to solve the wrong problems. Let's explore these two examples for a second. I think they are a useful self-contained exercise.
> > >
> > > Semicolon restrictions: The semicolon issue is about making it possible for the compiler to parse your code with no ambiguity. If the computer is supposed to serve the human, then why should I adjust how I write my code to the limits of the compiler? Either the compiler needs a better parser, or else we need to move dis-abiguation from the place where the human isn't: inside the parser, to the place where the human is: the code editor. A good IDE will know what I meant. And if it doesn't then it can ask me right then. The disambiguation happens once, right where the human is, and then never becomes an issue again. This fails of course if you edit the code outside an IDE, much like you could easily break a PNG by editing it outside a graphic editor. This is why storing code as plain text is an issue. Restricting the way the language deals with semicolons is solving the wrong problem. It forces more rules upon the human just to make life easier for the machine.
> > >
> > > Whitespace: The Go language has specific rules for how you use whitespace in your program. This makes for cleaner code since you can remove boilerplate syntax like curly braces. It also means code written by one person can easily be read by another. But again, we are solving the wrong problems. The problems are: displaying code in a clear and concise manner on screen, and letting programmers communicate unambiguously across time and space. Modifying the language syntax feels like the wrong solution. Instead we should be relying on the computer to do this. An IDE could display the code in the manner most pleasing to the human who is reading it, even if the person who wrote it uses different settings. Removing the curly braces is again making the human help out the parser. If we don't store code as plain text then it's a non issue. I can type in braces if it helps me to think. Another programmer can use indentation to indicate where a block begins and ends. The compiler
> never has to care because it's not plain text by the time it gets to the compiler. The other programmer doesn't have to care what whitespace form I use because he never sees it. He sees what is appropriate for him.
> > >
> > > Of course, lots of new complexity is introduced by going to non-plain text. All of our tools are built around it. But I think these are solvable if the benefits are worth it. Plus, once we jump in to the deep end of the non-ascii pool lots of other things become possible, or even trivial: more use of math symbols, visual editors for particular data types, and my personal favorite: multiline string literals.
> > >
> > > - Josh
> > >
> > >
> > > --
> > > Josh Marinacci
> > > joshondesign.com (http://joshondesign.com)
> > > > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > > > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).
> > > > For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> > > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).
> > > For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
> > --
> > You received this message because you are subscribed to the Google Groups "pdxfunc" group.
> > To post to this group, send email to pdx...@googlegroups.com (mailto:pdx...@googlegroups.com).
> > To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com (mailto:pdxfunc+u...@googlegroups.com).

Patrick Logan

unread,
May 8, 2012, 1:20:37 PM5/8/12
to pdx...@googlegroups.com
On Tue, May 8, 2012 at 9:56 AM, john melesky <li...@phaedrusdeinus.org> wrote:
> Aside from that, unless i'm missing something, that's the
> sort of graph structure you're looking for.

And again, the serialization discussion seems to me a distraction from
the interesting problem: how to improve a programmer's time spent
programming. I'm assuming whatever tools support this new way of
expressing programs has sufficient access paths into the graph. As a
programmer I am having a very hard time generating concern for how the
graph may or may not be serialized. I'd suggest setting that aside and
determining if there's something more substantial to be considered. Or
maybe I am just missing the point... I really have not figured that
out yet.

-Patrick

Matt Youell

unread,
May 8, 2012, 1:37:06 PM5/8/12
to pdx...@googlegroups.com
I was watching a cowboy movie the other day. A young couple got married and the bride promptly jumped in front of a bullet and died. She was buried next to her sister. At which point my inner graph theorist (we all have one) screamed "Where the hell will her husband be buried?"

And of course her husband is young and handsome and in the 1800s, so he'll get married again, have kids, some of whom will die from some awful disease. Where will all of these people be buried?

So it seems that graphs and serialization issues are not unique to programming. :)

I've been following this conversation and promising myself that I'd catch up before making comment, but that hasn't happened yet, and when has knowing what I'm talking about ever stopped me anyway?


A hodge-podge of assumptions I've noticed in this thread:

* There will be an AST / Hierarchy is necessary

The world is in constant motion. Does our program representation have to stay fixed? Even with macros you are still in handcuffs.

* Serialization is always flat

We're bound by a file metaphor from 60+ years ago. Maybe the problems we face are bigger than just a programming language or environment alone.

* Programming languages take center stage in programming problems

How many people are *awesome* at jQuery but are terrible Javascript programmers? (Trick question: the answer is "all of them".) Seriously though, there are levels of abstraction forming up above us. How much does a single programming language matter? Is CSS a DSL? Do web designers comprise the largest population of declarative programmers in history?


Btw, someone mentioned regexes: Regexes fit three dimensions (array over time) into two, so it's hard to beat that kind of information density. I'd still love to see people try.


Oh, and Markus referenced Chomsky, language, and graphs at some point. Googling got me lots of references to Chomsky-like things, but not any clear thing by Chomsky to peruse. A pointer to a book or paper would be greatly appreciated. Like Jake, I am under-Chomsky'd.


-/matt/-

Jake Brownson

unread,
May 8, 2012, 1:44:00 PM5/8/12
to pdx...@googlegroups.com
I think Patrick is right. The exact serialization is the less
interesting discussion, unless it turns out to be impossible. I am
fairly confident that's not the case however. I think even a naive
serialization like the one you describe would be performant. It isn't
too hard to add some indexing to make random node accesses much more
efficient and not require loading the whole file.

On Tue, May 8, 2012 at 9:56 AM, john melesky <li...@phaedrusdeinus.org> wrote:
--
Jake Brownson
Cofounder
Brainium Studios
Cell: 503.349.4841

Jake Brownson

unread,
May 8, 2012, 2:03:46 PM5/8/12
to pdx...@googlegroups.com
In addition to the discussion here I've been chatting with a group of
folks that are interested in this topic via email and though we all
have a different take on the idea one thing we agree on is that we
should have a place to continue this discussion that seems to have
taken off recently so I created a Google group with a temporary name
suggested by one of the members focused on this topic:

https://groups.google.com/forum/?fromgroups#!forum/augmented-programming

I think most of this thread is a bit off the core topic of pdxfunc,
and there are probably lots of folks on here that aren't really
interested, so for those of you that are interested I think it'd be
good to continue the discussion over in the new group. Hopefully it
can serve as a place for folks to continue the discussion, share work
they're doing and work they're discovering with others interested.

markus

unread,
May 8, 2012, 5:24:04 PM5/8/12
to pdx...@googlegroups.com
M --

> I was watching a cowboy movie the other day.

Where in the heck do you find cowboy movies these days?

> I've been following this conversation and promising myself that I'd
> catch up before making comment, but that hasn't happened yet, and when
> has knowing what I'm talking about ever stopped me anyway?

Did you drop a "not" or are you playing with us?

>
> * Serialization is always flat

80% chance you're trolling here, but I'm a sucker.

Not only is serialization always flat, you could go even further and say
that serialization is always linear. By definition. Because that's
what serialization means.

And no pulling out the time cube, 'cause I called it and jinxed it
first.
>
> Btw, someone mentioned regexes: Regexes fit three dimensions (array
> over time) into two, so it's hard to beat that kind of information
> density. I'd still love to see people try.

Agreed. My standard response to people who bemoan the opacity of
standard regular expressions is to give them a handful of disparate
examples and ask them for a notation that works for all of them and is
clearer on average.
>
> Oh, and Markus referenced Chomsky, language, and graphs at some point.
> Googling got me lots of references to Chomsky-like things, but not any
> clear thing by Chomsky to peruse. A pointer to a book or paper would
> be greatly appreciated. Like Jake, I am under-Chomsky'd.

I think it's a case of {clear-things} ^ {things-by-Chomsky} = {} He's
clearly brilliant, but I would never ask him for directions to the
mall.

The basic model I was referring to (deep structure = a graph, surface
structure = a tree, production/serialization an in-order traversal of
the tree, parsing/grammar a set of rules that allow reconstruction of
the tree from the serialization) mostly originated with Syntactic
Structures though it has gone all over the place since.

Wikipedia has some good overview (e.g.
http://en.wikipedia.org/wiki/Transformational_grammar ) and there are
some books (Steven Pinker's "Atoms of Language" comes to mind) that are
also worth reading.

But the tl;dr is simple: we process data in graphs/networks/relational
dbs/etc., but for external transmission/storage we need a way to
serialize the graphs. Language accomplishes this by mapping the graphs
to trees and the trees to sequences in a (if all goes well) reversible
way.

-- M





Phil Tomson

unread,
May 8, 2012, 5:53:34 PM5/8/12
to pdx...@googlegroups.com
On Tue, May 8, 2012 at 10:37 AM, Matt Youell <ma...@newmoniclabs.com> wrote:
> I was watching a cowboy movie the other day. A young couple got married and
> the bride promptly jumped in front of a bullet and died. She was buried next
> to her sister. At which point my inner graph theorist (we all have one)
> screamed "Where the hell will her husband be buried?"
>
> And of course her husband is young and handsome and in the 1800s, so he'll
> get married again, have kids, some of whom will die from some awful disease.
> Where will all of these people be buried?
>
> So it seems that graphs and serialization issues are not unique to
> programming. :)

Cremation & spreading the ashes to the four winds is the answer here:
atoms of the body spread out all over the place where they get
incorporated into new stuff. Come to think of it, maybe that's not a
bad serialization strategy. :) Isn't this where Wheeler comes in?

Phil

Matt Youell

unread,
May 8, 2012, 6:01:15 PM5/8/12
to pdx...@googlegroups.com
On Tue, May 8, 2012 at 2:53 PM, Phil Tomson <philt...@gmail.com> wrote:

Isn't this where Wheeler comes in?

 
Didn't want to hijack the thread with my crazy.

Matt Youell

unread,
May 8, 2012, 6:16:45 PM5/8/12
to pdx...@googlegroups.com
On Tue, May 8, 2012 at 2:24 PM, markus <mar...@reality.com> wrote:
Where in the heck do you find cowboy movies these days?


Ah, I didn't cite my sources[1]. Apologies.


 
> I've been following this conversation and promising myself that I'd
> catch up before making comment, but that hasn't happened yet, and when
> has knowing what I'm talking about ever stopped me anyway?

Did you drop a "not" or are you playing with us?


This is one of those times in English where the meaning is clear either way with sufficient context. Optionally, interpret as you'd like. :)
 
>
> * Serialization is always flat

80% chance you're trolling here, but I'm a sucker.


No trolling or copyright intended.
 
Not only is serialization always flat, you could go even further and say
that serialization is always linear.  By definition.  Because that's
what serialization means.


You see? Trapped by the bonds of language! Why do we call it serialization? Because it has to be serial. This reminds me of trying to explain to friends that time doesn't exist. Which is a nearly impossible concept to communicate in any time-dependent language such as English.

But I will decloak long enough to admit that I was conflating serialization and persistence. "Serialization" is still a biased word though and shouldn't be used in polite company.

 
And no pulling out the time cube, 'cause I called it and jinxed it
first.

Nice paradox! :)

 


Wikipedia has some good overview (e.g.
http://en.wikipedia.org/wiki/Transformational_grammar ) and there are
some books (Steven Pinker's "Atoms of Language" comes to mind) that are
also worth reading.

But the tl;dr is simple: we process data in graphs/networks/relational
dbs/etc., but for external transmission/storage we need a way to
serialize the graphs.  Language accomplishes this by mapping the graphs
to trees and the trees to sequences in a (if all goes well) reversible
way.

It's the "if all goes well" part that seems to fall apart. 

Thanks for the pointers.

-/matt/-

Josh Marinacci

unread,
May 8, 2012, 10:56:53 PM5/8/12
to pdx...@googlegroups.com
I don't think that we will find a single generalized visual language that could be used for all use cases.  But then, we don't have a single symbolic language that is used for all use cases either :)  There will always be a need for a variety of tools.

I do think certain problems lend themselves to a visual representation.  Web security is a trivial example of what I would call programming by containment.  Certain problems lend themselves to being organized by what things are inside of other things.  I think of these visually.

- Josh

-- 
Josh Marinacci

To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.

Josh Marinacci

unread,
May 8, 2012, 11:02:14 PM5/8/12
to pdx...@googlegroups.com
I do find the serialization aspect interesting since it matters for interoperability with other systems. However, once we make the leap to "the graph will be serialized to something standard" it becomes separable from the language itself, and might not be the right topic for this forum.

-- 
Josh Marinacci

Josh Marinacci

unread,
May 8, 2012, 11:06:48 PM5/8/12
to pdx...@googlegroups.com
A hodge-podge of assumptions I've noticed in this thread:

* There will be an AST / Hierarchy is necessary

The world is in constant motion. Does our program representation have to stay fixed? Even with macros you are still in handcuffs.

I would say no, which is really the beauty of having some standard serialization that is easy to manipulate.  We should be able to up convert it in 20 years when our awesome holographic language is finally released.
 
* Serialization is always flat

We're bound by a file metaphor from 60+ years ago. Maybe the problems we face are bigger than just a programming language or environment alone.
Some have suggested storing it in a database.  In fact, I don't think I've linked to it here yet, but I found this page with a ton of awesome ideas called Source Code In Database.


It starts with this gem:

We have been teaching our customers to regard their data as a precious resource that should be milked and reused by finding many possible ways of summarising, viewing and updating it. However, we programmers have not yet learned to treat our source code as a similar structured data resource

 
Btw, someone mentioned regexes: Regexes fit three dimensions (array over time) into two, so it's hard to beat that kind of information density. I'd still love to see people try.

They are incredibly dense, but they also are hard to write and even harder to read.  I keep feeling like there must be a way to improve them without ruining the density benefits, but I haven't found one yet.

- Josh
 



-/matt/-

Lyle Kopnicky

unread,
May 9, 2012, 8:21:31 PM5/9/12
to pdx...@googlegroups.com
On Tue, May 1, 2012 at 5:25 PM, Josh Marinacci <joshma...@gmail.com> wrote:
So..., from the postulate that surely we won't store our code as plain text 100 years from now, let's go exploring.

I don't think we'll still be coding 100 years from now. In perhaps 20-30 years we'll hit the Singularity, and the robots will do all the coding. We just have to code well enough to design them. :)

- Lyle 

Josh Marinacci

unread,
May 9, 2012, 11:19:22 PM5/9/12
to pdx...@googlegroups.com
true, but we will have to explain to the robots in great detail what we want them to code. This explanation will involve logic and symbols, hopefully not stored as an ASCII text file. :)

-- 
Josh Marinacci

--

Nathan Collins

unread,
May 9, 2012, 11:31:15 PM5/9/12
to pdx...@googlegroups.com

Lyle Kopnicky

unread,
May 10, 2012, 12:31:26 AM5/10/12
to pdx...@googlegroups.com
On Wed, May 9, 2012 at 8:19 PM, Josh Marinacci <joshma...@gmail.com> wrote:
true, but we will have to explain to the robots in great detail what we want them to code. This explanation will involve logic and symbols, hopefully not stored as an ASCII text file. :)

No, they will just intuit what we need and code it for us. Or else they will kill us off.

- Lyle 

Lyle Kopnicky

unread,
May 10, 2012, 2:07:37 AM5/10/12
to pdx...@googlegroups.com
One of the stated advantages of representing code using graph structures is that it is neutral with respect to display. I think this is true up to a point.

We keep talking about graphs, that must be serialized, and distinguish them from textual source code, which, I submit, is also just a serialized graph. So what the distinction we're trying to get at?

It's a difference in abstraction - the graphs that some people are pushing for are slightly more abstract than the source code we have now. Specifically, they are abstracted by creating equivalence classes - distinctions of line structure and spacing are ignored.

There are, however, equivalence classes over which our graph structures won't easily admit abstraction. Here's an example: Suppose you want to represent music. The music model Euterpea, presented by Paul Hudak in The Haskell School of Music, uses :+: to indicate serial composition, and :=: to indicate parallel composition. Thus, the two following formulas are equivalent:

mel1=(note(1/4)(Ef,4):=:note(1/4)(C,4)) :+: (note(1/4)(F, 4):=:note(1/4)(D,4)) :+: (note(1/4)(G, 4):=:note(1/4)(E,4))

mel2 =(note(1/4)(Ef,4):+:note(1/4)(F,4):+:note(1/4)(G,4)) :=:
(note(1/4)(C, 4):+:note(1/4)(D,4):+:note(1/4)(E,4))

That is, they sound the same when played. Likewise, 4 + (5 * 6) represents the same value as 4*5 + 4*6. Yet it is hard to imagine a "neutral graph structure" that wouldn't be biased toward one or the other.

The "serialized graph" representation, if I understand it correctly, can be implemented simply by pretty-printing the AST to a file. It will look like a legal source program, just with canonical spacing. When it has been loaded into the editor, you can choose to show it with different spacing.

- Lyle


Josh Marinacci

unread,
May 11, 2012, 1:13:55 PM5/11/12
to pdx...@googlegroups.com
In theory, yes. Pretty printed source code can be the same as a serialized AST.  The start to diverge, however, if you allow editing that pretty printed source directly instead of as the AST through an appropriate editor. The source also probably can't store non-textual information like images and fonts. I suspect this will be a matter of degrees.  By going whole hog to the persisted graph metaphor we can stop caring about the entire parsing / whitespace / encoding side of things and focus on what really matters: the developer experience.

- josh

-- 
Josh Marinacci

Patrick Logan

unread,
May 11, 2012, 1:29:59 PM5/11/12
to pdx...@googlegroups.com
cf. Smalltalk.
Reply all
Reply to author
Forward
0 new messages