Choosing a parser for Mathematica input

David Kirkby

unread,

Nov 7, 2010, 9:26:05 PM11/7/10

to

I'd like to write a program that uses a Mathematica-like syntax. Not a
100% clone, but as close to Mathematica input as reasonably practical.
Can anyone suggest a suitable front end parser? I was thinking of
using LLVM for the back end.

Here's a description of the syntax

http://reference.wolfram.com/mathematica/guide/Syntax.html
http://reference.wolfram.com/mathematica/tutorial/TheSyntaxOfTheMathematicaLanguage.html

Things to note in particular is that whitespace can often mean
multiplication. i.e.

In[1]:= 12.1 2

Out[1]= 24.2

Mathematica supports many programming styles - procedule, functional
and rule-based programming

A Lisp to Mathematica translator was written by Prof Fateman

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4310&rep=rep1&type=pdf

In the paper Fateman says the language appears to be ad-hoc, so he had
limited success with a common parser and used a hand-written one. But
I'm not entirely convinced of his objectivity - he tends to mock the
creators of Mathematica - his program is called MockMMA. He was one of
the creators of a computer algebra system Maxima, which he always
compares favorably to Mathematica, though few others do,

I've never written a compiler, beyond noddy calculators in
introductory books, so I suspect this language is not an ideal one to
learn with. But I'd be interested in what, if any tools would be
capable of handling such a complex language,

Dave

Jonathan Thornburg [remove -animal to reply]

unread,

Nov 9, 2010, 10:46:20 AM11/9/10

to

David Kirkby <drki...@gmail.com> wrote:
> A Lisp to Mathematica translator was written by Prof Fateman

^^^^^^^^^^^^^^^^^^^
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4310&rep=rep1&type=pdf

Fateman's work was a *Mathematica-to-Lisp* translator: quoting from that paper,
he wrote
"a Common Lisp program that can read (from a file or a keyboard) virtually
any Mathematica program or command, and will produce a Lisp data structure
closely resembling the FullForm printout of Mathematica."

It's also worth noting that Wolfram Research claims copyright over the
Mathematica language, and asserts that Fateman's translator infringed
that copyright. (Google 'Fateman Mathematica "Brown & Bain"' for a
letter from WRI's lawyers to Fateman claiming this.) I don't know the
exact boundary of how much of and/or how closely you can clone Mathematica
without getting into legal trouble.

--
-- "Jonathan Thornburg [remove -animal to reply]" <jth...@astro.indiana-zebra.edu>
Dept of Astronomy, Indiana University, Bloomington, Indiana, USA
"C++ is to programming as sex is to reproduction. Better ways might
technically exist but they're not nearly as much fun." -- Nikolai Irgens

Hans Aberg

unread,

Nov 9, 2010, 11:49:54 AM11/9/10

to

On 2010/11/09 16:46, Jonathan Thornburg [remove -animal to reply] wrote:
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4310&rep=rep1&type=pdf
>
> Fateman's work was a *Mathematica-to-Lisp* translator: quoting from that paper,
> he wrote
> "a Common Lisp program that can read (from a file or a keyboard) virtually
> any Mathematica program or command, and will produce a Lisp data structure
> closely resembling the FullForm printout of Mathematica."
>
> It's also worth noting that Wolfram Research claims copyright over the
> Mathematica language, and asserts that Fateman's translator infringed

> that copyright. (Google 'Fateman Mathematica "Brown& Bain"' for a

> letter from WRI's lawyers to Fateman claiming this.)

A computer language cannot be copyrighted, ...

> I don't know the exact boundary of how much of and/or how closely
> you can clone Mathematica without getting into legal trouble.

... only actual code. So only if actual code has been used, there might
be a problem.
[This agrees with my understanding as well, but none of us are lawyers,
so I'll end the legal discussion here. -John]

David Kirkby

unread,

Nov 9, 2010, 7:35:59 PM11/9/10

to

> > It's also worth noting that Wolfram Research claims copyright over
> > the Mathematica language, and asserts that Fateman's translator
> > infringed that copyright. (Google 'Fateman Mathematica "Brown&
> > Bain"' for a letter from WRI's lawyers to Fateman claiming this.)

Yes. but I think they gave up. His code is still available.

> A computer language cannot be copyrighted, ...
>
> > I don't know the exact boundary of how much of and/or how closely
> > you can clone Mathematica without getting into legal trouble.
>
> ... only actual code. So only if actual code has been used, there might
> be a problem.
> [This agrees with my understanding as well, but none of us are lawyers,
> so I'll end the legal discussion here. -John]

I'm more interested in the technical issues.

Does anyone have any comments about the most suitable tool for a
parser?

Dave

Wink Zhang

unread,

Nov 13, 2010, 4:08:11 AM11/13/10

to

> I'm more interested in the technical issues.
>
> Does anyone have any comments about the most suitable tool for a
> parser?

Hey Dave,

I may suggest you taking a look at yapp and also this tutorial (http://
alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf ). Perl
should be suitable in your case. Moreover, llvm is far more beyond a
parser.

-Qirun

Ira Baxter

unread,

Nov 26, 2010, 12:54:08 PM11/26/10

to

"David Kirkby" <drki...@gmail.com> wrote in message

> I'd like to write a program that uses a Mathematica-like syntax. Not a
> 100% clone, but as close to Mathematica input as reasonably practical.
> Can anyone suggest a suitable front end parser? I was thinking of
> using LLVM for the back end.

Others have noted that LLVM is much more than a parser. Are you
trying to generate *code* from MMa syntax? Are you trying to generate
code from pure MMa equations or from actual MMa programs?

> Here's a description of the syntax
>
> http://reference.wolfram.com/mathematica/guide/Syntax.html
> http://reference.wolfram.com/mathematica/tutorial/TheSyntaxOfTheMathematicaLanguage.html
>
> Things to note in particular is that whitespace can often mean
> multiplication. i.e.

> I've never written a compiler, beyond noddy calculators in

> introductory books, so I suspect this language is not an ideal one to
> learn with. But I'd be interested in what, if any tools would be
> capable of handling such a complex language,

MMa (equations or programs) isn't a particularly complex language,
either conceptually or from the point of view of a parser.
Mostly it is Lisp S-expressions using xyz[...] instead of (xyz ...).
*Executing* MMa code is a bit messier; you need a program
transformation system to do pattern-match/rewrites to implement
much of the semantics. And of course of your code contains any
complicated formulas needing simplification,
you might need all of MMa proper to provide the necessary
set of rewrites that encode all that math knowledge.

FWIW, our DMS Software Reengineering Toolkit (a program
transformation system) has a working MMa langauge parser
as an available option (see website for more details).
We use a GLR parser for all of our parsers;
I suspect that's overkill for the MMa grammar but it seems to work fine.

--
Ira Baxter, CTO
www.semanticdesigns.com

David Kirkby

unread,

Nov 26, 2010, 1:48:14 PM11/26/10

to

On Nov 13, 9:08 am, Wink Zhang <winkzh...@gmail.com> wrote:
> > I'm more interested in the technical issues.
>
> > Does anyone have any comments about the most suitable tool for a
> > parser?
>
> Hey Dave,
>
> I may suggest you taking a look at yapp and also this tutorial (http://
> alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf ). Perl
> should be suitable in your case.

Thank you very much. Sorry I did not reply earlier.

> Moreover, llvm is far more beyond a parser.

Is it a parser at all? I got the impression it took the abstract
syntax tree as input, and not the source language.

I've not played with LLVM at all. I suspect such a project is beyond
me, but I'd like to be aware of what tools might be able to do this.

Dave

David Kirkby

unread,

Nov 27, 2010, 2:11:11 PM11/27/10

to

On Nov 26, 5:54 pm, "Ira Baxter" <idbax...@semdesigns.com> wrote:
> "David Kirkby" <drkir...@gmail.com> wrote in message

> > I'd like to write a program that uses a Mathematica-like syntax. Not a
> > 100% clone, but as close to Mathematica input as reasonably practical.
> > Can anyone suggest a suitable front end parser? I was thinking of
> > using LLVM for the back end.
>
> Others have noted that LLVM is much more than a parser.

As I noted, I had not looked at LLVM much, but I was not aware it
could be used for parsing the code. I was under the impression that
for example to make a C compiler, it uses the gcc front end, rather
than parse the C code directly.

LLVM seemed to be an ideal tool for the back end though.

> Are you trying to generate *code* from MMa syntax? Are you trying
> to generate code from pure MMa equations or from actual MMa
> programs?

I don't fully understand the question, which is no doubt due to my
lack of knowledge.

I'm thinking of basically making an MMA clone, which accepts
Mathematica input and acts as an interpreter. However, unlike Octave,
which is a MATLAB clone, I was not looking to make a 100% clone. If
part of the Mathematica language was particularly difficult to parse,
then it would be ignored.

If for example, it was found to be very difficult to parse input like

In[1]:= Pi //N

Out[1]= 3.14159

then I would be happy to accept that this could easily be written as:

In[2]:= N[Pi]

Out[2]= 3.14159

and not worry too much about it. I'm sure the above will not be an
issue, but there are some other complex parts of Mathematica which I
thought might be very challenging.

Likewise, if there are serious flaws in the way the language works, it
might be better to not worry about being compatible, but do it better.
Richard Fateman's paper points out what he considers a huge number of
flaws, but I take a lot of what he says with a pinch of salt. He is
clearly no fool, but is very negative about almost anything unless it
is Lisp. For example, he thinks use of a space for multiplication is a
bad idea, yet that's how most people write maths. We write

2 Pi x

rather than

2*Pi*x

> > Here's a description of the syntax
>
> >http://reference.wolfram.com/mathematica/guide/Syntax.html

> >http://reference.wolfram.com/mathematica/tutorial/TheSyntaxOfTheMathe...

> MMa (equations or programs) isn't a particularly complex language,
> either conceptually or from the point of view of a parser.
> Mostly it is Lisp S-expressions using xyz[...] instead of (xyz ...).

It always stuck me as complex language to use, with such obscure
syntax in places, that I thought it would be very difficult to parse.
Some Mathematica code is similar in obscurity to entries for the
obfuscated C contest!

In the paper "A Lisp-Language Mathematica-to-Lisp translator" by Prof.
Fateman

http://citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.51.4310%26rep%3D
rep1%26type%3Dpdf&usg=AFQjCNG_OKvzYioUhV95ZSjVkwvLGW9xBw

he says in the section "Lexical Analysis and Parsing"

"After trying (with only modest success) various mostly-automatic
parsing techniques, we ended up with a basically hand-coded parser.
Mathematica unlike some computer algebra systems does not feature an
extensible syntax: this suggested the implementation was somewhat ad-
hoc."

I'm not sure how objective Richard's comments are though. I don't know
if

* He just wanted to have a dig at Wolfram Research.
* Richard is right, and most automatic parsing techniques are not
appropriate.
* He chose inappropriate mostly--automatic parsing techniques. (He
does not list what he tried)
* He chose the right parsing tools, but did a poor implementation

Much of the paper suggests the first may be the case, but it may be
the second, which is more worrying from my point of view.

> *Executing* MMa code is a bit messier; you need a program
> transformation system to do pattern-match/rewrites to implement
> much of the semantics. And of course of your code contains any
> complicated formulas needing simplification,
> you might need all of MMa proper to provide the necessary
> set of rewrites that encode all that math knowledge.

Yes, I am aware of that. My thoughts are that if one could get to the
point of being able to parse the input, making it open-source, then
others would be able to improve it by encoding at least a subset of
the maths knowledge. Realistically, it is not going to be possible to
make a fully functional Mathematica clone.

> FWIW, our DMS Software Reengineering Toolkit (a program
> transformation system)

I was looking for open-source solution, to make an open-source
alternative to Mathematica.

> Ira Baxter, CTOwww.semanticdesigns.com

Thank you Ian

Dave

fat...@gmail.com

unread,

Feb 5, 2015, 1:29:33 PM2/5/15

to

Hi, comp.compiler guys.
I came across this thread while googling for something I wrote;
found this thread with stuff written about me.. It's a bit stale but
if David Kirby wants a Mathematica clone, he should look at Mathics.
Regarding the difficult of parsing the Mathematica language, he misses
the point, I think. By the way, after conducting a worldwide search
for a better name for the language-minus-the-math-library, Stephen
Wolfram chose, tada, "The Wolfram Language".

Anyway the difficulty isn't parsing x //f to get f(x). It is in the
separation of lexical and syntactic uses of characters like "." which have
multiple uses.
a.3 is Dot[a,3]. a .3 is Times[0.3, a]. And the difficulty with using
"space"
or merely adjacency for multiplication is not resolved by asserting that
mathematicians use it all the time.

I have written papers and email in which I am critical of Mathematica.
I have also written programs that are partial implementations of that program
but with illustrative changes showing (in my view) better design decisions.
Some of these decisions are mathematical rather than language issues.

Being (threatened or actually) sued by Wolfram is not so unusual. It does add
some perspective to the fellow. See for example,

http://vserver1.cscs.lsa.umich.edu/~crshalizi/reviews/wolfram/

oh, about the name for the language, see
https://groups.google.com/forum/#!searchin/sage-devel/egofart/sage-devel/VEVW
bpDLc_g/cI0iIhIk7-AJ
for my own acronymic juggling, which explains the name EGOFART.

glen herrmannsfeldt

unread,

Feb 6, 2015, 10:15:44 PM2/6/15

to

fat...@gmail.com wrote:
> Hi, comp.compiler guys.
> I came across this thread while googling for something I wrote;
> found this thread with stuff written about me.. It's a bit stale but
> if David Kirby wants a Mathematica clone, he should look at Mathics.
> Regarding the difficult of parsing the Mathematica language, he misses
> the point, I think. By the way, after conducting a worldwide search
> for a better name for the language-minus-the-math-library, Stephen
> Wolfram chose, tada, "The Wolfram Language".

It seems that most computer languages are designed to be easy to parse
by programs, but not for ease of use by people.

When computers were smaller and slower, that might have made sense,
but for the computers we have today, making it easier for people
should come first.

I went to a talk, not so long ago, by someone actually studying people
using computer languages. It seems that many people who write papers
about how easy or hard they are to use don't actually do any tests
with real people.

> Anyway the difficulty isn't parsing x //f to get f(x). It is in the
> separation of lexical and syntactic uses of characters like "." which have
> multiple uses.

> a.3 is Dot[a,3]. a .3 is Times[0.3, a]. And the difficulty with using
> "space"
> or merely adjacency for multiplication is not resolved by asserting that
> mathematicians use it all the time.

Interesting that when we learn multiplication in 3rd grade, we learn
the x operator for it, but then when we learn algebra we expect
adjacent terms to be multiplied without any operator. Mathematicians
have been doing that for centuries.

Though there is the compilication of multiple character symbolic
names, which mathematicians seem still not to have adopted.

> I have written papers and email in which I am critical of Mathematica.
> I have also written programs that are partial implementations of that program
> but with illustrative changes showing (in my view) better design decisions.
> Some of these decisions are mathematical rather than language issues.

But did you do any tests comparing your design to others? That is,
using actual people, both experienced and not so experienced?

The person I mentioned above, though I didn't find the reference,
compared currently popular languages taught in introductory classes
against ones with random ASCII characters in place of the keywords,
and the random one did better!

-- glen
[I can believe it, but I believe there is also some evidence that once
you get beyond examples of trivial size, languages that are hard for
computers to parse are also hard for people to use. For example, with
implicit multiplication, what does this mean:

a b c + d e f + g

Rules like APL's everything associates to the left, or languages where
you have to parenthesize everything may seem "unnatural" but also are
a lot easier to get things written correctly. -John]

Robert Jacobson

unread,

Feb 6, 2015, 10:16:13 PM2/6/15

to

I am among those who are trying to write a Mathematica parser. There are a few
Mathematica parsers out there.

* Richard Fateman's old MockMMA of course (written in Lisp) which is mentioned
in the first post of this thread from 2010.

* This MIT-licensed parser written in Scala is notable. Written by Mateusz
Paprocki "and contributors" a couple of years ago.
https://github.com/mattpap/mathematica-parser

* Alex Gittens's basicCAS is a python Mathematica parser, but it appears to
have disappeared from the author's website. It's still available elsewhere on
the net for those interested in looking for it. This project is interesting
because it includes Alex's notes regarding implementation.
https://pypi.python.org/pypi/basicCAS/1.0

* Mathics, a Mathematica clone written in Python and created by Jan Pvschko,
is particularly interesting. You can use it online. GPL licensed, copyright
2007.
http://www.mathics.org

* omath is similar in spirit to Mathics but is written in Java and appears to
have been abandoned at version 10^(-16) in 2005. The parser is a generated
parser using JavaCC and JJTree. The source code is distributed without a
license.
http://omath.org/w/index.php?title=Main_Page

* symja contains a Mathematica parser for a reasonable subset of Mathematica.
https://code.google.com/p/symja/source/browse/#svn/trunk/matheclipse-parser

SymPy and Sage both have rudimentary Mathematica parsers that work for basic
mathematical expressions. There are also some (rumors of) proprietary
Mathematica parsers. John Herrop of Flying Frog Consultancy claims to have
written a Mathematica parser in 300 lines of OCaml under contract for Wolfram
Research. The code is obviously not publicly available. Semantic Designs, a
company specializing in compiler technologies, lists Mathematica among the
many languages for which it has "front ends." (Both mentioned here:
http://stackoverflow.com/questions/1608380/parser-for-the-mathematica-syntax.
)

Wolfram Research may be grotesquely litigious, but given the existence of so
many of these projects it seems to me they are no longer interested in suing
everyone who even thinks the word Mathematica as they once were.

My own project is to construct an ANTLR4 grammar for Mathematica. Such a
grammar seems to me to be the easiest way to benefit the most people with
current compiler technologies. ANTLR4 has c#, python, java, and javascript
targets (some better than others), and a c/c++ target is allegedly in the
works. Several projects have expressed interest in such a thing, but it
appears nobody has succeeded in producing it. I am very happy with how far I
was able to get with very little effort, but my grammar isn't yet ready for
public consumption (read: doesn't work). See my top-level post to this
community for an example of one of the problems I'm still trying to solve with
my project.

Best,

Robert

Derek M. Jones

unread,

Feb 8, 2015, 3:49:03 AM2/8/15

to

Glen,

> I went to a talk, not so long ago, by someone actually studying people
> using computer languages. It seems that many people who write papers
> about how easy or hard they are to use don't actually do any tests
> with real people.

Yes, plenty of arm waving and personal opinions abound.

One experiment suggests that developers are good at doing
what they do most:
http://www.knosof.co.uk/dev-experiment/accu06.html
[Write buggy code? -John]

glen herrmannsfeldt

unread,

Feb 8, 2015, 11:24:23 AM2/8/15

to

Derek M. Jones <derek@_nospam_knosof.co.uk> wrote:

(snip, I wrote)

>> I went to a talk, not so long ago, by someone actually studying people
>> using computer languages. It seems that many people who write papers
>> about how easy or hard they are to use don't actually do any tests
>> with real people.

> Yes, plenty of arm waving and personal opinions abound.

OK, here it is:

The talk was titled "The Programming Language Wars".

http://web.cs.unlv.edu/stefika/Papers.php

is a list of some of the papers by the author, and

paper number 7 is one people might find interesting.

The authorization system doesn't allow one to copy the link, but if
you go to the page you should be able to read it, even without an ACM
account.

There is very little research into how people actually use features in
programming languages, though all designers seem to already know
without studying.

He did actual tests with both experienced and new programmers.

-- glen

Robin Vowels

unread,

Feb 8, 2015, 11:32:27 AM2/8/15

to

From: "glen herrmannsfeldt" <g...@ugcs.caltech.edu>
Sent: Friday, February 06, 2015 11:05 AM

> It seems that most computer languages are designed to be easy to parse
> by programs, but not for ease of use by people.

An interesting thought, but not really the case, except for a few languages like C.

> When computers were smaller and slower, that might have made sense,

It is in this era that great efforts were made to make programming easier for people.
And even though computer memories were measured in hundreds and thousands of words,
considerable progress was made.
In the early 1950s, simple languages were developed to simplify the task of programming,
such as Alphacode, TIP, Easicode, GIP, etc etc. Some were tagged "autocode s".
FORTRAN and GEORGE came along in the middle 1950s.
And COBOL and Algol arrived in the late 1950s.
BASIC was developed for the same purpose.

None of these (with the exception of small parts of FORTRAN) was
developed to make it easy for any parser.

Even PL/I was developed to make it easier for the programmer, though
some parts were not simple and probably not easy to parse (such as the
absence of reserved keywords)

Some features of FORTRAN were not exactly easy to parse, (blanks not
having any significance), but probably simplified the keying programs
on punch cards, though brought with it an unwitting source of
programmer errors.

> but for the computers we have today, making it easier for people
> should come first.

Indeed.

>> Anyway the difficulty isn't parsing x //f to get f(x). It is in the
>> separation of lexical and syntactic uses of characters like "." which have
>> multiple uses.

A number of special characters have traditionally acquired multiple uses.
The asterisk, parentheses, apostrophe, colon, period, feature heavily
on account of the limited number of characters available in equipment
of the era (48-character set, 60 character set, etc.) and continue to do so
for historical reasons.

Algol perhaps fared best with preparation equipment that provided the
mathematical symbols, though implementations were bugged by line
printers that could deal only with upper-case characters and a
restricted set of special characters.
---
This email contains an annoying advertisement because avast! Antivirus protection is active.

Hans-Peter Diettrich

unread,

Feb 8, 2015, 7:11:57 PM2/8/15

to

Robin Vowels schrieb:

> A number of special characters have traditionally acquired multiple uses.
> The asterisk, parentheses, apostrophe, colon, period, feature heavily
> on account of the limited number of characters available in equipment
> of the era (48-character set, 60 character set, etc.) and continue to do so
> for historical reasons.

IMO the real and persistent limit is the keyboard, with a limited number
of keys. Increasing the number of keys is not a solution, because then
it may be more time consuming to find an special key, instead of reusing
a character or typing more characters. Why didn't APL succeed in the
long run?

OTOH there exist languages (and speakers) of languages with bigger
character sets, like Japanese or Chinese. What if people, used to
according input methods (character composition...), started to invent
and use dedicated glyphs for keywords and operators, in new programming
languages? Wouldn't this prevent coders of other *natural* languages
from using such a *programming* language, for its unreadability and
complicated keying?

DoDi

News Subsystem

unread,

Feb 10, 2015, 1:49:10 AM2/10/15

to

From: "Hans-Peter Diettrich" <DrDiet...@netscape.net>
Sent: Monday, February 09, 2015 6:49 AM

> Robin Vowels schrieb:
>
>> A number of special characters have traditionally acquired multiple uses.
>> The asterisk, parentheses, apostrophe, colon, period, feature heavily
>> on account of the limited number of characters available in equipment
>> of the era (48-character set, 60 character set, etc.) and continue to do so
>> for historical reasons.
>
> IMO the real and persistent limit is the keyboard, with a limited number
> of keys.

Present keyboards for the PC have keys for 33 special characters
(in addition to upper and lower-case and digits).
That's twice as many in use in languages.

> Increasing the number of keys is not a solution, because then
> it may be more time consuming to find an special key, instead of reusing
> a character or typing more characters.

No need to increase the number of keys. Recall old typewriters?
Some had provision for three shifts (lower-case, upper-case, and special characters).
The same could be done with computer keyboards (well, it's sort of done with the normal, caps, and
CTRL keys).
A proper third shift key with corresponding glyphs inscribed on the keys
could extend the range of special characters. After all, there is provision for 256 characters
encoded in a byte, and we use only half of them.

>Why didn't APL succeed in the long run?

Because it was write once throw-away language.
Programs were largely unintelligible to all except the writer.

This email probably contains viruses and malware despite what avast! Antivirus protection says.
[It was perfectly possible to write readable APL programs, but almost
nobody did. -John]

Derek M. Jones

unread,

Feb 10, 2015, 8:46:16 PM2/10/15

to

Glen,

> The talk was titled "The Programming Language Wars".
>
> http://web.cs.unlv.edu/stefika/Papers.php

and the answer is to invent another language!
http://www.quorumlanguage.com/

> is a list of some of the papers by the author, and
>
> paper number 7 is one people might find interesting.

Most of this paper is based on two self reporting surveys,
the sort of thing pollsters use to rate public opinion.

The experiment that was run for the second part of the paper
looks very sloppy.

> There is very little research into how people actually use features in
> programming languages, though all designers seem to already know
> without studying.

Some work is analyzed here:
http://shape-of-code.coding-guidelines.com/2014/08/27/evidence-for-the-benefits-of-strong-typing-where-is-it/

There is a growing body of measurements of the use of programming
language features. Here is a very good one on the use of eval in
Javascript:
http://the.gregor.institute/papers/ecoop2011-richards-eval.pdf

It is now common to encounter papers that measure stuff and arm
wave about finding power laws and how interesting this must be.
Sometimes they even check to see whether the data does in fact
follow a power law:
http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/491.html

What is generally missing is any theory for the patterns seen.
A step up from arm waving here:
http://shape-of-code.coding-guidelines.com/2013/05/17/preferential-attachment-applied-to-frequency-of-accessing-a-variable/
Of course developer decisions on what variables to use are not based
on probabilities, this is an emergent behavior of higher level, as
yet unknown, processes.

> He did actual tests with both experienced and new programmers.

The academic definition of experienced is often very different from
that used by industry. It is common to hear third year undergraduates
labeled as experts and first year students as novices; industry would
regard them both as novices. As far as I can see in this paper
experienced subjects are the ones enrolled in computer science courses
and inexperienced subjects are those not enrolled in such courses.

Thanks for the pointer, but I have deleted my copy of this paper.