OBSCURA ET TRIVIA: 12 Random research notes by Quentin R. Skrabec, Ph.D
Verne and the Rudiments of the Statistical Student T Test
The Student's t-test is a statistical tool used to compare the means of two data sets and determine if they come from the same population. The Student's t-test tests whether the difference between two groups' data is statistically significant by comparing the data’s frequency distribution. That is comparing whether the proportions of data in each group “statistically” match with some degree of confidence. For example, a dietician wants to know if two different diets lead to different mean weight loss amounts. Note that with the statistical significance test, you are not looking for an exact match. The central core of such analysis was the normal distribution, which was first derived by de Moivre in 1733 to predict the outcome of games of chance. In the 1840s, German mathematician Carl Gauss (1777–1855) also popularized the normal distribution (also called Gaussian), which was used in astronomy to analyze errors in planetary orbits. At the same time, Pierre Simon Laplace was another French mathematician working with probability and frequency distributions to see if they were significantly different.
Verne’s 800 Leagues Down the Amazon (1881) pre-shadows the Student T-tests that William Sealy Gosset defined mathematically in 1908. Gosset worked at the Guinness Brewery in Dublin, Ireland, and was interested in the problems in comparing the chemical properties of small barley samples on brewing improvements. By the 1920s, it would be applied to the physical sciences, and in the 1930s, it became the basis for modern cryptology in military applications.
Verne’s interest in cryptology led him to the importance of comparing letter distributions in secret documents and ciphers. The three stories Verne wrote that each hinged on a cryptogram that had to be broken were:
* A Journey To The Center of the Earth (1864)
* The Giant Raft or 800 Leagues On The Amazon (1881)
* Mathias Sandorf (1885).
Not surprisingly, Verne’s literary muse, Edgar Allan Poe, was also fascinated by cryptography and considered one of the most skilled cryptologists of his time. Poe's 1843 short story The Gold-Bug introduced several innovative cryptographic techniques, including frequency distribution analysis. Verne and Poe clearly drew on Carl Gauss’s 1840s publication of applying the normal distribution. Verne used Poe’s analytical approach and referenced Poe in his book 800 Leagues On The Amazon (1881).
The key and first step to solving secret messages is letter frequency analysis, the basis of all codebreaking. Each language has its own typical distribution of letters in a text. While Verne applied Poe's analytical approach, he also expanded and clarified the use of frequency distribution comparison to solve codes. Verne’s approach was precise, unlike Poe’s.
William Frederick Friedman, head of the American Signal Intelligence Service and Code Breaking Division, during World War II declared: “When we look at into the types of cryptograms other writers of romantic tales and detective stories have employed, we must recognize that he [Verne] stands head and shoulders above them all, not excluding even Poe..…Verne's genius calls for admiration and respect – even on the part of professional cryptographers”[i]
A few months before Pearl Harbor, Fredrick Friedman published “Jules Verne as a Cryptographer.” After the attack, the government classified Friedman’s article and all of Freidman’s documents and papers ( they were declassified in 1973). The article is readily available on the internet today.
Verne produced a perfect verbal description of a Student T-test before Gossett applied statistical formulae to the methodology in 1904. Of course, alphabetical significance testing was only the first step in breaking the code, and Poe and Verne would also apply their own substitution algorithm for the final solution. These algorithms can be difficult and complex. There are several scholarly works on Poe and Verne’s use of these substitution algorithms (see Gass, Frederick. “Solving a Jules Verne Cryptogram.” Mathematics Magazine, vol. 59, no. 1, 1986, pp. 3–11). However, the first step of code-breaking relates to the Student T test, which Verne applies simplistically in his novel 800 Leagues On The Amazon (1881).
Verne’s cipher in 800 Leagues On The Amazon is a rudimentary example of statistical significance testing far earlier than its everyday use in analytical analysis. Verne’s analytical approach, like Poe's, was amazing at the time. I can share a personal experience of Verne’s approach. On my Ph.D. exam (in 1997), there was what appeared to be a straightforward question and one that seemed an easy one to bull shit my way through, showing my genius 😊 until I found out how it was graded. I was asked to lay out a methodology to compare two data sets. A fellow student who was more knowledgeable than me addressed the question with 4 to 5 pages (as did I), but he applied more detailed formulae. However, he failed the question because it required that you set up a simple frequency table before diving into the sophisticated formulae, even though it could be solved with the formulae alone. The point being you should always conceptualize the data first. This, starting with a simple frequency table, was precisely how Jules Verne approached a similar problem in his novel.
This first step of a frequency table allowed Verne to give a perfect verbal description of significance testing in 1881 before the development of the mathematical formulae of Gossett in 1904.
Verne first laid out a frequency table of the letters used as follows
a = 3
times
b = 4 —
c = 3 —
d =16
e = 9 —
f =10 —
g =13
h = 23 —
—and so on
For a Total... 276 times.
Then Verne notes:
“One thing strikes me at once, and that is that in this paragraph all the letters of the alphabet are not used. That is very strange. If we take up a book and open it by chance it will be very seldom that we shall hit upon two hundred and seventy-six letters without all the signs of the alphabet figuring among them. After all, it may be chance,” and then he passed to a different train of thought. “One important point is to see if the vowels and consonants are in their normal proportion.”
“And so he seized his pen, counted up the vowels,
“And thus there are in this paragraph after we have done our subtraction, sixty-four vowels and two hundred and twelve consonants. Good! that is the normal proportion. That is about a fifth, as in the alphabet, where there are six vowels among twenty-six letters. It is possible, therefore, that the document is written in the language of our country [Portuguese].”
Now Verne compares the letter distribution in the document to the proportion that letters appaer in a normal page of a book—concluding:
“I see that that is h, for it is met with twenty-three times. This enormous proportion shows, to begin with, that h does not stand for h, but, on the contrary, that it represents the letter which recurs most frequently in our language, for I suppose the document is written in Portuguese. In English or French it would certainly be e, in Italian it would be i or a, in Portuguese it will be a or o. Now let us say that it signifies a or o.”
Of course, there still much more needed to break the code, and a substitution algorithm would need to be applied as Verne did.
While the Student T test allows a small sample size, the bigger size makes for better confidence. Verne used a book page, but a better distribution is based on the current dictionary for a country. Here’s an example of a frequency table in English.
Reading Verne’s novel should be part of any introductory course in statistics.
[i] William Friedman – “Jules Verne as Cryptographer” – The Signal Corps Bulletin, April-June 1940 pp.70-107
“One thing strikes me at once, and that is that in this paragraph all the letters of the alphabet are not used. That is very strange. If we take up a book and open it by chance it will be very seldom that we shall hit upon two hundred and seventy-six letters without all the signs of the alphabet figuring among them. After all, it may be chance,”
“And thus there are in this paragraph after we have done our subtraction, sixty-four vowels and two hundred and twelve consonants. Good! that is the normal proportion. That is about a fifth, as in the alphabet, where there are six vowels among twenty-six letters. It is possible, therefore, that the document is written in the language of our country [Portuguese].”
--
You received this message because you are subscribed to the Google Groups "Jules Verne Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jules-verne-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jules-verne-forum/CAFuNt7eWbhBWKS5rD%2B3bcWZUtnThGCNwCE%2BVhg%2B_%2B57ZGmK3iQ%40mail.gmail.com.
This was why William Frederick Friedman point, head of the American Signal Intelligence Service and Code Breaking Division, during World War II called Verne a Genius:
“When we look at into the types of cryptograms other writers of romantic tales and detective stories have employed, we must recognize that he [Verne] stands head and shoulders above them all, not excluding even Poe..…Verne's genius calls for admiration and respect – even on the part of professional cryptographers”[i]
A few months before Pearl Harbor, Fredrick Friedman published “Jules Verne as a Cryptographer.” After the attack, the government classified Friedman’s article and all of Freidman’s documents and papers ( they were declassified in 1973). The article is readily available on the internet today
Freidman also saw the incorrect letter distribution in the actual novel.
Ill certainly read your - is there an English translation
THANK you, i love this discussions!!!
quentin
Regarding the topic of Verne and mathematics, I am sharing a link to an article (in Spanish) from the magazine La linterna del traductor that discusses the mathematical problems (and errors) presented in Sans dessus dessous. The article is written by Eugenio Manuel Fernández Aguilar, a physicist and science communicator. The translation, published two years ago, was done by Elena Bernardo Gil. Both of them worked closely together on this project.
https://lalinternadeltraductor.org/n22/capitulo-suplementario-del-reves.html
KR
Alicia
Alicia Martorell | +34 629 50 95 91 | ad...@aliciamartorell.es | about.me/aliciamartorell | @aliciamartorell
To view this discussion on the web visit https://groups.google.com/d/msgid/jules-verne-forum/SN6PR02MB4910065918ECA27AD2CCA947FF452%40SN6PR02MB4910.namprd02.prod.outlook.com.
To view this discussion visit https://groups.google.com/d/msgid/jules-verne-forum/CAHS2Hb1mzS0trRZ-P%2Bb_TrHxq9iuAjEhfLTWYGsFE1dwbc8igg%40mail.gmail.com.