Here is my grammar (I allow an arbitrary number of Ms)
numeral -> thousands
thousands -> thous_part hundreds | thous_part | hundreds
thous_part -> thous_part M | M
hundreds -> hun_part tens | hun_part | tens
hun_part -> hun_rep | CD | D | D hun_rep | CM
hun_rep -> C | CC | CCC
tens -> tens_part ones | tens_part | ones
tens_part -> tens_rep | XL | L | L tens_rep | XC
tens_rep -> X | XX | XXX
ones -> ones_rep | IV | V | V ones_rep | IX
ones_rep -> I | II | III
Comments?
Thanks,
Mike Sullivan
This doesn't accept IIII for 4 (as found on many clocks with Roman
Numeral faces, for example), nor does it accept the "shorthand"
forms: IC for 99, IIC for 98, MVM for 1995 and so on.
The rule is that any smaller number placed before a larger
number is subtracted from the larger number.
I know of no examples where the "smaller number"
consists of other than a single numeral, or the two identical numerals
II, XX or CC. However, constructions such as IIIII for "five", IIX for "eight"
or VV for "ten" have been discovered in manuscripts.
A bar placed over a number multiplies it by one thousand,
and a double bar multiplies it by one million.
This could be implemented in your system by using parentheses
to denote the bar: thus (I) would represent 1,000.
(In the Middle Ages, 500, usually D, was sometimes written as
I followed by an apostrophus, resembling a backwards C, while 1,000
was written as CI followed by an apostrophus.)
The more general question raised by this discussion (and more relevant
to comp.compilers) is how "forgiving" should a parser be in the case
where the language being parsed has no formal definition: or where
there are several, conflicting formal definitions?
Do you accept anything that can possibly be interpreted,
or do you place "arbitrary" restrictions in order to simplify
the grammar, at the expense of rejecting existing files?
--
Martin
mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Now rewrite the grammar as a regular expression :)
--
Ivan Boldyrev
Roman numerals (in their standard form without IIII and other stuff) don't
much differ from decimal positional system. The only problem is that glyphs
of the decimal places depend on the position. The grammar could look like
numeral := [ numeral ] decimal-place(n)
decimal-place(n) := 0(n) | 1(n) | 2(n) | 3(n) | ... | 9(n)
0(n) := <empty>
1(n) := x(n)
2(n) := x(n)x(n)
3(n) := x(n)x(n)x(n)
4(n) := x(n)v(n)
5(n) := v(n)
6(n) := v(n)x(n)
7(n) := v(n)x(n)x(n)
8(n) := v(n)x(n)x(n)x(n)
9(n) := x(n)x(n+1)
Here n is the number of the decimal place and finally:
x(1) = I
x(2) = X
x(3) = C
x(4) = M
v(1) = V
v(2) = L
v(3) = D
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Following the "Robustness Principle" attributed to John Postal,
("Be liberal in what you accept, and conservative in what you send."
http://ironick.typepad.com/ironick/2005/05/my_history_of_t.html),
there is a very simple regular expression for Roman Numerals
[MDCLXVI]+
Taking the most liberal interpretation of the rule "a smaller number
placed before a larger numeral should be subtracted from it,
instead of added to it", every string of numerals has a valid interpretation.
For example, "IXM" is interpreted as as IX (9) subtraced from M (1000)
to give 991.
This method has the advantage of providing representations for zero
and negative numbers: for example VVX is one representation of zero,
If only the Romans had followed the Robustness Principle, they might
have discovered zero and negative numbers, and changed the course
of mathematical history!
Not in "regulars", but the following more or less obvious pattern
NOEMPTY
( ['M'['M'['M']]]
['C'['D'|'M'|'C'['C']] | 'D'['C'['C'['C']]]]
['X'['L'|'C'|'X'['X']] | 'L'['X'['X'['X']]]]
['I'['V'|'X'|'I'['I']] | 'V'['I'['I'['I']]]]
)
should do the trick. More complex were to use immediate assignments for
evaluation of the numeral's value it gets matched. But it is also doable.
Though a hand-written scanner would be faster and cleaner.
> Taking the most liberal interpretation of the rule "a smaller number
> placed before a larger numeral should be subtracted from it,
> instead of added to it", every string of numerals has a valid interpretation.
> For example, "IXM" is interpreted as as IX (9) subtraced from M (1000)
> to give 991.
That's my understanding as well. Nonetheless I'm not sure of the full
algorithm, with regards to the total sum.
What about "IXIM"? Intuitively this would be 990 (-(-I+X+I)+M), but it
might be calculated as (-I+X)+(-I+M) = 1008 as well.
Similarly: "IXX" vs. "XIX", where "IX" is a smaller number, and "XI" is
not a smaller number, than the following numeral "X".
> This method has the advantage of providing representations for zero
> and negative numbers: for example VVX is one representation of zero,
>
> If only the Romans had followed the Robustness Principle, they might
> have discovered zero and negative numbers, and changed the course
> of mathematical history!
Great ideas :-)
DoDi
Why (-I+X)+(-I+M) and not -(-I+X)+(-I+M) ? Following the same rule as
before ("a smaller number placed before a larger numeral should be
subtracted from it, instead of added to it") the result will be -(-I+X)
+(-I+M) = -(9)+(999) = 999 - 9 = 990. Since (-I+X) is smaller than (-I
+M), it will be substracted, not added.
fwiw, I've read that IIII was the standard form which the Romans used,
and that IV was a later innovation.
googling "roman iiii iv" gives this for instance:
http://elginwatches.org/help/roman_IIII.html
http://www.bhi.co.uk/hints/roman.htm
--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net
> Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>> Roman numerals (in their standard form without IIII and other stuff) don't
>
> fwiw, I've read that IIII was the standard form which the Romans used,
> and that IV was a later innovation.
I meant modern standard usage of.
But my point was about Roman numerals as a decimal positional system with
glyphs varying with the position. This drastically simplifies understanding
of how they are built up as well as the implementation of input / output.
Compare this with four rules as described in Webster.
Actually whether IIII and VIIII glyphs should be used for 4 and 9 is
irrelevant as long as IIII is not mixed with VI. On the contrary, other
constructs like MIM (for 1999) and IC (for 99), which were never used
otherwise than mistakenly, contradict to very positional representation.
Which makes me think that the system was positional at least in the minds
of Romans.
> constructs like MIM (for 1999) and IC (for 99), which were never used
> otherwise than mistakenly, contradict to very positional representation.
> Which makes me think that the system was positional at least in the minds
> of Romans.
It might be necessary to distinguish between the usage in calculations,
and the usage in documents. In documents a shorthand notation might have
been used, whereas I dont know how Romans did their calculations. Known
is the spelling of numbers, and there of course we find your assumed
positional system.
A grammar for the shorthand notation even may be context sensitive, also
in an cultural/educational sense, where the most elaborated form was
used by people which had to cast numbers into stone...
DoDi