Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Verne's math

44 views
Skip to first unread message

quentin skrabec

unread,
Oct 15, 2024, 11:27:02 AM10/15/24
to Jules Verne Forum

OBSCURA ET TRIVIA: 12                           Random research notes by Quentin R. Skrabec, Ph.D

                        Verne and the Rudiments of the Statistical Student T Test

            The Student's t-test is a statistical tool used to compare the means of two data sets and determine if they come from the same population. The Student's t-test tests whether the difference between two groups' data is statistically significant by comparing the data’s frequency distribution. That is comparing whether the proportions of data in each group “statistically” match with some degree of confidence. For example, a dietician wants to know if two different diets lead to different mean weight loss amounts. Note that with the statistical significance test, you are not looking for an exact match. The central core of such analysis was the normal distribution, which was first derived by de Moivre in 1733 to predict the outcome of games of chance. In the 1840s, German mathematician Carl Gauss (1777–1855) also popularized the normal distribution (also called Gaussian), which was used in astronomy to analyze errors in planetary orbits. At the same time, Pierre Simon Laplace was another French mathematician working with probability and frequency distributions to see if they were significantly different.

            Verne’s 800 Leagues Down the Amazon (1881) pre-shadows the Student T-tests that William Sealy Gosset defined mathematically in 1908. Gosset worked at the Guinness Brewery in Dublin, Ireland, and was interested in the problems in comparing the chemical properties of small barley samples on brewing improvements. By the 1920s, it would be applied to the physical sciences, and in the 1930s, it became the basis for modern cryptology in military applications.

Verne’s interest in cryptology led him to the importance of comparing letter distributions in secret documents and ciphers. The three stories Verne wrote that each hinged on a cryptogram that had to be broken were:

* A Journey To The Center of the Earth (1864)

* The Giant Raft or 800 Leagues On The Amazon (1881)

* Mathias Sandorf (1885).

            Not surprisingly, Verne’s literary muse, Edgar Allan Poe, was also fascinated by cryptography and considered one of the most skilled cryptologists of his time.  Poe's 1843 short story The Gold-Bug introduced several innovative cryptographic techniques, including frequency distribution analysis. Verne and Poe clearly drew on Carl Gauss’s 1840s publication of applying the normal distribution. Verne used Poe’s analytical approach and referenced Poe in his book 800 Leagues On The Amazon (1881).

            The key and first step to solving secret messages is letter frequency analysis, the basis of all codebreaking. Each language has its own typical distribution of letters in a text. While Verne applied Poe's analytical approach, he also expanded and clarified the use of frequency distribution comparison to solve codes. Verne’s approach was precise, unlike Poe’s.

           

            William Frederick Friedman, head of the American Signal Intelligence Service and Code Breaking Division, during World War II declared:  When we look at into the types of cryptograms other writers of romantic tales and detective stories have employed, we must recognize that he [Verne] stands head and shoulders above them all, not excluding even Poe..…Verne's genius calls for admiration and respect – even on the part of professional cryptographers[i]

            A few months before Pearl Harbor, Fredrick Friedman published “Jules Verne as a Cryptographer.” After the attack, the government classified Friedman’s article and all of Freidman’s documents and papers ( they were declassified in 1973). The article is readily available on the internet today.

 

            Verne produced a perfect verbal description of a Student T-test before Gossett applied statistical formulae to the methodology in 1904.  Of course, alphabetical significance testing was only the first step in breaking the code, and Poe and Verne would also apply their own substitution algorithm for the final solution. These algorithms can be difficult and complex. There are several scholarly works on Poe and Verne’s use of these substitution algorithms (see Gass, Frederick. “Solving a Jules Verne Cryptogram.” Mathematics Magazine, vol. 59, no. 1, 1986, pp. 3–11). However, the first step of code-breaking relates to the Student T test, which Verne applies simplistically in his novel 800 Leagues On The Amazon (1881).

            Verne’s cipher in 800 Leagues On The Amazon is a rudimentary example of statistical significance testing far earlier than its everyday use in analytical analysis. Verne’s analytical approach, like Poe's, was amazing at the time. I can share a personal experience of Verne’s approach. On my Ph.D. exam (in 1997), there was what appeared to be a straightforward question and one that seemed an easy one to bull shit my way through, showing my genius 😊 until I found out how it was graded. I was asked to lay out a methodology to compare two data sets. A fellow student who was more knowledgeable than me addressed the question with 4 to 5 pages (as did I), but he applied more detailed formulae. However, he failed the question because it required that you set up a simple frequency table before diving into the sophisticated formulae, even though it could be solved with the formulae alone. The point being you should always conceptualize the data first. This, starting with a simple frequency table, was precisely how Jules Verne approached a similar problem in his novel.

 

             This first step of a frequency table allowed Verne to give a perfect verbal description of significance testing in 1881 before the development of the mathematical formulae of Gossett in 1904.

 

Verne first laid out a frequency table of the letters used as follows

    a =  3 times
    b =  4  —
    c =  3  —
    d =16
    e =  9  —
    f =10  —
    g =13

    h =   23  —

—and so on


For a Total... 276 times.

        Then Verne notes: 

“One thing strikes me at once, and that is that in this paragraph all the letters of the alphabet are not used. That is very strange. If we take up a book and open it by chance it will be very seldom that we shall hit upon two hundred and seventy-six letters without all the signs of the alphabet figuring among them. After all, it may be chance,” and then he passed to a different train of thought. “One important point is to see if the vowels and consonants are in their normal proportion.”

 

“And so he seized his pen, counted up the vowels,

 

“And thus there are in this paragraph after we have done our subtraction, sixty-four vowels and two hundred and twelve consonants. Good! that is the normal proportion. That is about a fifth, as in the alphabet, where there are six vowels among twenty-six letters. It is possible, therefore, that the document is written in the language of our country [Portuguese].”

            Now Verne compares the letter distribution in the document to the proportion that letters appaer in a normal page of a book—concluding:

“I see that that is h, for it is met with twenty-three times. This enormous proportion shows, to begin with, that h does not stand for h, but, on the contrary, that it represents the letter which recurs most frequently in our language, for I suppose the document is written in Portuguese. In English or French it would certainly be e, in Italian it would be i or a, in Portuguese it will be a or o. Now let us say that it signifies a or o.”

Of course, there still much more needed to break the code, and a substitution algorithm would need to be applied as Verne did.

           

            While the Student T test allows a small sample size, the bigger size makes for better confidence. Verne used a book page, but a better distribution is based on the current dictionary for a country. Here’s an example of a frequency table in English.

 

Reading Verne’s novel should be part of any introductory course in statistics.

 

 

 

 

 

 

 

 

 

 

 



[i] William Friedman – “Jules Verne as Cryptographer” – The Signal Corps Bulletin, April-June 1940 pp.70-107

Garmt de Vries-Uiterweerd

unread,
Oct 15, 2024, 1:01:04 PM10/15/24
to jules-ve...@googlegroups.com
Hi Quentin,

Always nice to see Verne get credit for understanding maths and science! However, in his analysis of the Jangada cryptogram he also frequently misses the point. To convolute matters, the English translation you cite severely misrepresents Verne's own writing. Two examples:

“One thing strikes me at once, and that is that in this paragraph all the letters of the alphabet are not used. That is very strange. If we take up a book and open it by chance it will be very seldom that we shall hit upon two hundred and seventy-six letters without all the signs of the alphabet figuring among them. After all, it may be chance,”


The original reads:

«Ah! ah! fit le juge Jarriquez, une première observation me frappe: c’est que, rien que dans ce paragraphe, toutes les lettres de l’alphabet ont été employées! C’est assez étrange! En effet, que l’on prenne, au hasard, dans un livre, ce qu’il faut de lignes pour contenir deux cent soixante-seize lettres, et ce sera bien rare si chacun des signes de l’alphabet y figure! Après tout, ce peut être un simple effet du hasard.»

Verne gets it right here. The letters K, Y and Z have a relative frequency of about 0.2% in French, so for each of these the probability of occurring at least once in a 276 letter sequence is ( 1 – (1–0.002)^276 ) = 0.42. For X: ( 1 – (1–0.004)^276 ) = 0.67. For J: ( 1 – (1–0.007)^276 ) = 0.86. Multiplying these probabilities for all the letters in the alphabet yields a total probability of seeing all letters at least once of about 1.4%. Not a high probability, but not vanishingly small either.

The English translations completely inverts the argument. It states that NOT all letters are used, and that the odds of NOT seeing all letters at least once are very low. This is incorrect.

“And thus there are in this paragraph after we have done our subtraction, sixty-four vowels and two hundred and twelve consonants. Good! that is the normal proportion. That is about a fifth, as in the alphabet, where there are six vowels among twenty-six letters. It is possible, therefore, that the document is written in the language of our country [Portuguese].”

This is a misunderstanding on Verne's part. Yes, there are 6 vowels in the alphabet, but these letters (except Y) are much more common than most consonants. The letter E alone has a relative frequency of about 20%! If anything, this observation should tell Jarriquez that the document is NOT in Portuguese.

(In fact, the numbers I used applied to French, as do some of Verne's further arguments. The text turns out to be in French too. But the argument also holds for Portuguese.)

I published a detailed investigation of Verne's cryptanalysis in La Jangada back in 2015, should you be interested. It's in this volume:

Cheers,
Garmt

James D. Keeline

unread,
Oct 15, 2024, 2:32:49 PM10/15/24
to jules-ve...@googlegroups.com
As with rhyming poetry, I would think that a cryptogram would provide a particular difficulty for a translation.

In a Poe Gold-Bug-style analysis of the frequencies of letters, they surely must vary based on the language and even dialect.  Consider the simple example of British English vs. American English with the frequency of a letter like "u".  More common words in British English use it than American.  This affects the international production of word games like Scrabble.

With Mathias Sandorf, things are even more difficult because of the window style of decoding which requires the same number of letters.

Thinking of Poe again, one wonders how well the story was translated to French in the version Verne would have seen and been inspired by.

James D. Keeline

--
You received this message because you are subscribed to the Google Groups "Jules Verne Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jules-verne-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jules-verne-forum/CAFuNt7eWbhBWKS5rD%2B3bcWZUtnThGCNwCE%2BVhg%2B_%2B57ZGmK3iQ%40mail.gmail.com.

quentin skrabec

unread,
Oct 15, 2024, 4:03:35 PM10/15/24
to jules-ve...@googlegroups.com
You are correct "not" is an error of Verne or a translation error-- i even thought of dropping it for clarity in making my point - it does negate the the language conclusion. I agree with you, although I never had an interest in such a deep dive into the details.  There are many articles dealing with the code in the novel ( they are great scholarship in their own right, although exhausting to read)

But my point is not in the actual distributions of letters and languages. It the use of a statistical test to compare  - this is the application of interest - his methodology is the key. This type of analysis was only just emerging in the late 1840s . Some efforts in the 1860s followed the methodology is great success

Poe 's analytical comparison was actually used in the Civil War and was front-page news 

This was why William Frederick Friedman point, head of the American Signal Intelligence Service and Code Breaking Division, during World War II called Verne a Genius:  


When we look at into the types of cryptograms other writers of romantic tales and detective stories have employed, we must recognize that he [Verne] stands head and shoulders above them all, not excluding even Poe..…Verne's genius calls for admiration and respect – even on the part of professional cryptographers[i]

            A few months before Pearl Harbor, Fredrick Friedman published “Jules Verne as a Cryptographer.” After the attack, the government classified Friedman’s article and all of Freidman’s documents and papers ( they were declassified in 1973). The article is readily available on the internet today


Freidman also saw the incorrect letter distribution in the actual novel. 

Ill certainly read your - is there an English translation

THANK you,   i love this discussions!!!

quentin    



From: jules-ve...@googlegroups.com <jules-ve...@googlegroups.com> on behalf of James D. Keeline <ja...@keeline.com>
Sent: Tuesday, October 15, 2024 2:32 PM
To: jules-ve...@googlegroups.com <jules-ve...@googlegroups.com>
Subject: Re: [JVF] Verne's math
 
You received this message because you are subscribed to a topic in the Google Groups "Jules Verne Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jules-verne-forum/h9DcauaB0Hc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jules-verne-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jules-verne-forum/CAGvpz09mEMCoDTXpVrzyzQ4%3DPQKB%3DSYtuzcBaGvuOAFZuVOe3w%40mail.gmail.com.

quentin skrabec

unread,
Oct 15, 2024, 4:05:50 PM10/15/24
to jules-ve...@googlegroups.com


From: quentin skrabec <qrsk...@gmail.com>
Sent: Tuesday, October 15, 2024 4:03 PM

Alicia Martorell

unread,
Nov 19, 2024, 1:30:57 PM11/19/24
to jules-ve...@googlegroups.com

Regarding the topic of Verne and mathematics, I am sharing a link to an article (in Spanish)  from the magazine La linterna del traductor that discusses the mathematical problems (and errors) presented in Sans dessus dessous. The article is written by Eugenio Manuel Fernández Aguilar, a physicist and science communicator. The translation, published two years ago, was done by Elena Bernardo Gil. Both of them worked closely together on this project.

https://lalinternadeltraductor.org/n22/capitulo-suplementario-del-reves.html

KR

Alicia 

Alicia Martorell | +34 629 50 95 91 | ad...@aliciamartorell.es | about.me/aliciamartorell | @aliciamartorell



quentin skrabec

unread,
Nov 19, 2024, 1:45:49 PM11/19/24
to jules-ve...@googlegroups.com
Thank you for the article. There are of course, well-documented errors. 


From: jules-ve...@googlegroups.com <jules-ve...@googlegroups.com> on behalf of Alicia Martorell <ad...@aliciamartorell.es>
Sent: Tuesday, November 19, 2024 1:30 PM

RFOG

unread,
Nov 20, 2024, 4:48:29 AM11/20/24
to jules-ve...@googlegroups.com
Alicia,

Didn't knew this edition from Alba. I've just purchased it. Very interesting article as well. 

Thanks. 

Enviado con el correo electrónico seguro de Proton Mail.

Reply all
Reply to author
Forward
0 new messages