Google Colaboratory: Literate programming for data analysis

338 views
Skip to first unread message

Pascal Bergmann

unread,
Oct 25, 2017, 9:43:27 AM10/25/17
to Eve talk
https://colab.research.google.com/

Text + code + output. Reminds me of the current Eve editor and Jupyter notebooks. Seems like literate programming becomes more and more interesting in fields like data analysis.

So, maybe don't drop the current version of the editor completely as it might become a more accepted programming approach and therefore an easy entry point to Eve.

magicmo...@gmail.com

unread,
Oct 26, 2017, 5:39:13 AM10/26/17
to Eve talk
Sorry, but Colab is basically mathematica's notebooks opened up to more languages. If mathematica was so great, it would have conquered the world by now, but even after my estimate of 5000 man-years of effort it is still a pretty ugly system. The great advantage of their language is that the internals are completely symbolic. However, they don't have hardly any of the next-gen language features that people are begging for. The Google research project is a massive waste of time in my opinion. Having your comments appear in varied font sizes, and having a different background color could have been automatically generated by a pretty-printer from original code, and to build an entire system around the concept that people write tiny chunks of code and then execute them is an extremely narrow sliver of the total space of what people are programming. Almost every task i can think of, down to the tic tac toe program, has more lines of code than can fit on a screen, so if there examples only show 10 lines of code then a result, the results just aren't that significant if you think about it. I can tell you that nobody I know would toss out their copy of Excel to use the google product.

I find the whole literate code fad which was started by the perverse Donald Knuth has not paid off in a significant way. I am not saying it isn't nice, it is. I will never forgive that great genius Knuth for writing his algorithms books using an assembly language that doesn't exist. Out of some idiotic notion that he shouldn't endorse any one company's assembly language he invented his own nutty assembly language that doesn't match any real hardware, rendering all of his sample code basically useless, and his book a paperweight. When people need an algorithm they search stack overflow and don't even think about his incredibly well thought out and debugged algorithms, because who wants to translate MIX into the language you need.

the world needs a simple, but powerful language, a notation that allows us to end this tower of babel we are trapped in, and enter the age of interchangeable parts, where we make better progress. The past 200 years of mathematics was basically devoted to algebrizing everything they could get their hands on, yet people still long for some geometric representation that is more understandable. No doubt that interactive computer graphics can make algebra more fun and understandable, but since all of geometry can be represented in algebra, but not the other way around, people need to stop at look at where we are in math today. Nobody can get ahead of the math we have now, and we certainly don't want to go back to the 1600's... 

ruben.n...@gmail.com

unread,
Oct 26, 2017, 8:29:41 AM10/26/17
to Eve talk
Magic,

I find your statements pretty unnecessarily insulting.

I generally agree with your last paragraph. (Less on math getting too 'algebriey', but more on the need for a more geometric representation.) I would like to point you to DynamicLand: https://twitter.com/Dynamicland1. Which is some work being done by Bret Victor's HARC group on making a physical space a part of the computation.

But I cannot agree with your general sentiment on 'Notebooks', Mathematica, Jupyter, etc., these are not created to create programs in or put all of your data in. They are made to quickly play with ideas and see if something is promising and then share it. They fit a different use case a more macro use case. Tell me this do you enjoy REPL's when you program? A Notebook is just a REPL that is focused on data. Yes, Google is copying someone else but don't we all copy? Software folks are notorious thieves. We shouldn't put down people for their work, maybe they Google will do one little piece of the Notebook better and thereby pushing all Notebooks forward. 

As for your thoughts on Literate Programming, I do not believe it has been practiced enough to discredit it yet. Because Literate Programming in it's true form requires that the code order be less important, something that most of our programming environments do not support.

Finally Knuth himself and his book. His choice of MIX was fine for the time. This was before C or ML, yes ALGOL and FORTRAN existed but most programming still used proprietary assembly languages. His use of MIX made sense at the time and considering the tomb that it is now it makes sense he won't switch it. Others have translated his pseudocode into more modern languages, so his choice of MIX hardly make his work moot. Yes, there are more approachable books now for algorithms but his book was still pivotable to the field of Computer Science. (Can you imagine what might have happened if Knuth endorsed an assembly language? It might have dried up the language experimentation as programmers standardized on one because it's now the "industry and textbook standard".)

There is no reason to put people down. Don't like what they are doing? Then don't help them. You seem to know what you want in a programming language, find similar people who want that too and go and collaborate. Nothing gets better by mere criticism, someone still has to do the work.

magicmo...@gmail.com

unread,
Oct 26, 2017, 12:56:01 PM10/26/17
to Eve talk
Knuth came up with the idea of literate programming as a sequel to structured programming. Unfortunately his approach was completely wrong. I am not the only one who has pointed this out (http://akkartik.name/post/literate-programming). I am a nobody compared to Knuth, so maybe stating the emperor has no clothes on may seem offensive. I don't mean any personal disrespect to Knuth, but his TeX product has left me cold from day 1.  Having your code comments presented in nicer typography is well and good, but honestly, do you consider Knuth's TeX system any good? It is a disaster in my book, a bizarre, complex curiosity that he squandered a big chunk of his career on. Knuth was so ridiculous that he refused to use TrueType for encoding fonts. He invented his own font format, Metafont. He was so famous and influential nobody pushed back on him, but who else on earth would refuse to use one of the commercial type formats? Its like someone building a motorcycle and deciding that neither English units or Metric is good enough, and that you need to use an incompatible set of nuts and bolts. The history of computers is full of non-agreement on basic standards. ASCII vs. EBCDIC, Mac vs. PC, iOS vs. Android, etc., but why when 99.9% of the world is one of two camps, do you invent your own 3rd form which has no substantial advantages. 

The problem with Wolfram Mathematica's notebook approach is that you are limited to that single way of working with Mathematica. There is only one modality you can use. It makes it very hard to enter the world of software interchangeable parts. The invention of interchangeable parts goes back to the 1800's, and in america the great inventor Eli Whitney was one of the pioneers. It later was used to devastating results in the American Civil War, which was the first time armaments had been mass-produced using that technology. I want to see software enter this era, and Mathematica's notebook approach did not facilitate this, so I consider it an impediment. You are entirely correct that interchangeable parts means small chunks of code that snap together like lego. 

Knuth's choice of MIX was never fine at any time. At any moment from 1965 onward a single company has owned the lions share of the CPU market. It was IBM, then DEC, and for at least 36 years since the IBM PC, the Intel instruction set has 99% of the desktop market. Nowadays the ARM instruction set has 99% of the mobile market, but server and desktop are 99% intel architecture, and if Knuth had for example picked the intel instruction set, not only would most of the code still run fine, because the intel architecture has been phenomenally backwards compatible, but there are many commercial cross-assembler tools that efficiently convert intel instruction set to other chips like motorola 68000, MIPS, ARM, etc.. By using MIX he doomed his unbelievably meticulous work to be basically unused today. What company can make a living selling and maintaining a MIX cross assembler, when only one human in history ever used MIX? I argue that Knuth was being perversely manufacturer neutral. I can't tell you how many programmers i have seen have his books on their shelf, but never actually used them.

Josh Cole

unread,
Oct 26, 2017, 1:46:57 PM10/26/17
to Eve talk
@magicmouse:

We want to build a community where people can openly discuss their ideas here. Phrasing such as "Out of some idiotic notion that he shouldn't endorse any one company's assembly language he invented his own nutty assembly language that doesn't match any real hardware" obscures the intended message rather than enhancing it. You raise some interesting points, but you'll have a hard time finding quality debate about them if your message comes across as hostile. Please take your tone into account so that we can continue to have meaningful discussions.

That said, let's discuss those interesting points:

> ...could have been automatically generated by a pretty-printer from original code

I'd actually argue that this is the  tangle/weave problem from the original literate programming spec all over again. By requiring external tooling to do basic rich text operations you basically guarantee that they just won't be done. Humans rarely choose to accept even small amounts of additional friction for long term gain when there aren't short term benefits. If you can reduce the very small amount of friction down to zero though, you stand a chance at changing behavior. That said, I don't know that this is significantly better than markdown comments for programmers, but I could see it being very useful to less digitally-minded notebook users such as scientists.

> I will never forgive that great genius Knuth for writing his algorithms books using an assembly language that doesn't exist. 

Knuth was in a very different world then. The programmer by necessity had to do a lot more of the computer's work. Even today, the programmer is still tasked with wholly understanding the system, From that perspective, the implementation of an algorithm was a lot less interesting than it's description. By refusing to pick a side, he chose to trade off drop-in usability for universality. He could guarantee that no external feelings about Intel or AT&T assembly were coloring perceptions of his work. Was the tradeoff worth it? *shrug*. It seems unfair though to treat it as a decision with no benefits.

> Almost every task i can think of, down to the tic tac toe program, has more lines of code than can fit on a screen ... enter the age of interchangeable parts, where we make better progress

I think these are two sides of the same coin. If you genuinely believe that a program can't be distilled down  into small components, it's pretty hard to imagine the world of interchangeable parts. By necessity, those parts have to be small to be interchangeable. The larger the part the more complicated and specific it becomes. Far more M8 machine screws are used than grandfather clock springs, because they assume far less about your goals.

magicmo...@gmail.com

unread,
Oct 26, 2017, 3:51:36 PM10/26/17
to Eve talk
I do not expect that programs will devolve into very small pieces of a few lines. There is a principle in systems theory called Ashby's Law of Requisite Variety, which dictates that there is a minimum complexity for each particular task and that it is impossible to reduce a task beyond a certain point. My own intuition about interchangeable parts indicates that medium sized pieces will be more commonly used. the key factor blocking interchangeable parts in computers is the irregularity of data structures; each system uses a different mechanism of encoding, and JSON doesn't cut it as the universal interchange format, which has superseded the awful XML. JSON does not work that well for multimedia assets and is not a database. Eve is among the very few of the next gen languages that correctly recognized that without a database you aren't really a next gen development language/system. There are an awful lot of language projects recently done or underway now that don't have a database system inside them, and i consider them doomed, but obviously their proponents are optimistically assuming that data interchange is not a major factor. Since a computer's entire purpose is to input, massage, and output data, having the ability to store and structure data flexibly is rather crucial, but yet project after project gets fired up without any mention of database (Red team, are you listening?)

I think that Eve's approach using markdown gets the typographical benefits that are the big payoff, with minimal additional typing cost, and avoids the awful tangle/weave approach that was dead on arrival to most of the programming community.  At the time Knuth did his original work there were no IDE environments outside of the LISP community, which worked in isolation.  LISP was interpreted in an era where computers were slow, and compiled code was king. Times have changed, and now we have surplus CPU power that we can burn it recomputing massive amounts between keystrokes. So now we are in the instant feedback era. Knuth's approach was a dead end, and by acknowledging its failure, we can try something else.

Not to beat a dead horse, but if Knuth had picked Intel assembler or AT&T assembler to use in his book, there are simple tools that will convert between the two, because it is really a minor difference. Intel is mov dest, src, and AT&T is mov src, dest. BFD.  Since the intel machine had fewer registers than any other computer, the first and most popular CPU was actually a great least common denominator to select for a reference book. The moto 68000 had 32 general purpose registers, and the intel only had 4 (at best), so his sample code using only 4 registers could have been then mapped to a better CPU trivially. I have tried to use his work in the past, and wouldn't have so much heat on this issue if i hadn't wrestled with MIX. I have had to learn in my career OS/360 assembler, Moto 68000 and Intel assembler to a rock solid level, and when i hit MIX and found that it didn't map well to any of the three machine languages i have had to learn, it was upsetting to say the least. You pay $50 for a hardbound book, and in return you hope for value, and there wasn't any because it was too much work. His algorithm implementations were so cleverly optimized, that simple errors of mapping to intel assembler were fatal to the reliability, and i gave up trying to convert mix. 

There is no greater genius in the history of computers who has made so much valuable work less usable than Knuth. The amount of breadth and depth of material he covers is mind-boggling. But I find it personally offensive to see genius wasted. There is a old story i read a long time ago, about some wealthy nobleman who hired Leonardo da Vinci to design a fantastic castle out of marzipan. He built this incredible model of a castle, then they wheeled it out to a party of aristocrats, who proceeded to eat his masterpiece, much to his chagin. Maybe Knuth was channeling Leonardo who wrote his notebooks in mirror image writing so nobody could read them.  Liebniz was another super genius, and evidently he spent most days researching family trees for wannabe nobles. Super geniuses are rare, and instead of being in every programmer's back pocket, the Knuth material rots away.
Reply all
Reply to author
Forward
0 new messages