Mumps data typing, related to sorting...

222 views
Skip to first unread message

kdt...@gmail.com

unread,
Oct 26, 2008, 9:42:58 PM10/26/08
to Hardhats
Hey all,

I was having a funny result, were it looked like 2.1 came before 2.0
in a $O( )
But if you look at the dump below, it shows that the 2.1 is a number,
and the 2.0 is a string.

Looking through my code, the line that seems to be creating this is:
set ^TMG("KIDS","PENDING PATCHES",PckInit,Ver)=count

So apparently sometimes Ver is a number, and sometimes it is a string.
But I am confused. I thought that numbers and strings were
immediately interchanged.
I.e. write 2.1="2.1"
1

So how would one go about reproducing the results seen below?

Thanks
Kevin

GTM>zwr ^TMG("KIDS","PENDING PATCHES","IB",*)
^TMG("KIDS","PENDING PATCHES","IB",2.1)=0
^TMG("KIDS","PENDING PATCHES","IB","2.0")=105

Maury Pepper

unread,
Oct 26, 2008, 10:23:40 PM10/26/08
to hardhats at googlegroups
2.0 is not a number because it is not in "canonical form". IF +Ver=Ver then Ver is a number. +2.0 equals 2 -- try it.
Two ways of dealing with this:
1. If Ver is guaranteed to be a number, use +Ver instead of Ver in the SET command. This method can not distinguish between 2 and 2.0 and 2.00 and 02, etc. (Consequently, it's no good with certain codes like ICD9.)
2. Force the Ver subscript to be a string by concatenating a non-digit character, eg, "*"_Ver or Ver_"*" will work.

JohnLeo Zimmer

unread,
Oct 26, 2008, 10:42:44 PM10/26/08
to Hard...@googlegroups.com
Kevin,
I don't know how the two different forms made their way into your example,
but 2.0 will be rounded to 2 in any numeric function.

So:
GTM>w 2.0
2
GTM>w 2.1
2.1
GTM>w "2.0"
2.0
GTM>w +"2.0"
2

"2.1" would sort after "2.0" , I think.
And both of those after 2, and 2.1

Thus:

GTM>s x(2.0)="a",x(3.1)="b",x("2.0")="c",x("2.1")="d",x("2.10")="e"
^
GTM>zwr x !
x(2)="a" !
x(2.1)="d" <--------------------------------!
x(3.1)="b"
x("2.0")="c"
x("2.10")="e"

But "2.1" gets converted to numeric while "2.10" does not.
So, it turns out I can reproduce the phenomenon but not explain it.
{Well, now I can, because Maury just did.} :-)

jlz

kdt...@gmail.com

unread,
Oct 26, 2008, 10:55:50 PM10/26/08
to Hardhats
Can you tell me more about this canonical issue?

Zimmers example of write 2.0 --> output of 2 is disturbing.

I am having a difficult time with this because some patches specify
their version to be *2* and others will say *2.0*. And now it seems
that 2.0 behaves differently than 2.1 ! The wonders never end. :-)

Thanks
Kevin

JohnLeo Zimmer

unread,
Oct 27, 2008, 8:26:42 AM10/27/08
to Hard...@googlegroups.com
kdt...@gmail.com wrote:
> Can you tell me more about this canonical issue?
>
> Zimmers example of write 2.0 --> output of 2 is disturbing.

Per Maury's suggestion:
(Adding just a space " " will force string indexes)
(Adding "+" forces numeric, "x(+y)"
Beating a dead horse a little, perhaps:

GTM>s x("2.0 ")=1,x("2.1 ")=2,x("2.0")=3,x(2.1)=4,x("2.1")=5,x(+"2.0")=6,
x(+"some text")=7,x("")=8,x(+"")=9

GTM>zwr
x(0)=9 <--- 7,9 equivalent, (0).
x(2)=6
x(2.1)=5 <--- 4,5 same-same.
x("")=8
x("2.0")=3
x("2.0 ")=1
x("2.1 ")=2

regards,
johnleo

Kevin:

Steven McPhelan

unread,
Oct 27, 2008, 8:50:53 AM10/27/08
to Hard...@googlegroups.com
A canonical number is a mathematical term.  A canonical number cannot start with a leading zero for the whole number and the decimal portion cannot end with a trailing zero.  In M ANSI standards, all numbers are of the form x * 10 to the yth power where x is an element of the open set [0,1).  So all numbers would be between 0 - .999999999... x 10 to the yth power.  Fileman internal date.time is a numeric.
 
I may be wrong, but I believe the actual ANSI standards state that all variables are strings.  It is the M implementors who allow for a M configuration to sort values numerically, that is, canonical numbers will be listed first followed by strings collated in ASCII order.
 
I often challenge individuals to come up with the most efficient algorithm that will determine whether or not the input variable is a numeric or a string and you cannot set the error trap and you must only use ANSI standard M commands and functions and your function itself cannot generate an error.
 
It is all very straight forward and well explained in the M ANSI standards.

K.S. Bhaskar

unread,
Oct 27, 2008, 9:50:03 AM10/27/08
to Hardhats
Kevin --

In a nutshell, MUMPS numbers sort before strings. But then is "2" a
number or a string? The solution is to define "canonical numbers"
such that if a string has a form like "2" which is a canonical number,
it should be treated as a number. "2.0" is a non-canonical form (at
least in mathematics, a canonical form is the standard way to
represent something that can have multiple representations), and is
hence treated as a string, whereas "2.1" is a canonical form which is
treated as a number. As noted in other posts, prefixing a non-
canonical number form with a + forces it to be a number, but the
results may or may not be intuitive (depending on your intuition, of
course).

Regards
-- Bhaskar

Jim Self

unread,
Oct 27, 2008, 12:11:33 PM10/27/08
to Hard...@googlegroups.com
Minor correction. Conversion of strings to canonic numbers in MUMPS is not a matter of rounding, except where the numeric value is outside the range of numeric precision for a given MUMPS implementation. The precision of simple arithmetic on integers may be higher than the precision of floating point operations.

Always convert numeric input to canonic form use in subscripts or for comparisons, such as "if x=y".

s x=2,y="2.0"
w x=y ;-- returns 0 (false)

w +x=+y ;-- returns 1 (true)

also,
w y=+y ;-- returns 0
-- 

---------------------------------------
Jim Self
Systems Architect, Lead Developer
VMTH Information Technology Services, UC Davis
(http://www.vmth.ucdavis.edu/us/jaself)
---------------------------------------
M2Web Demonstration with VistA
(http://vista.vmth.ucdavis.edu/)
---------------------------------------

kdt...@gmail.com

unread,
Oct 27, 2008, 6:38:28 PM10/27/08
to Hardhats
Thanks everyone.

This is an example of having a "high" level language that tries to do
things in the background for you to make your life easier. Only
sometimes they make your life harder. I love strongly typed languages
like pascal etc.

But thanks for the good responses.

Kevin

Frederick D. S. Marshall

unread,
Oct 28, 2008, 1:49:57 PM10/28/08
to Hard...@googlegroups.com
Dear Kevin,

The MUMPS Standard does indeed say that all variables contain strings,
that the string is the only data type, but it is lying. This is just not
true. MUMPS has readily distinguishable data types that behave very
differently in different situations, and it provides rules for how four
of the most important ones behave.

The string is indeed MUMPS's lingua franca. Every kind of data MUMPS can
represent can be represented as a string. You know it's a string if it
is surrounded by quotation marks. Without the quotes, a MUMPS value is
something else, some other type, not a string.

So, for example, "2.0" and 2.0 are not the same value. One is clearly a
string and the other is clearly not. You can use operators or other
MUMPS "verbs" to coerce them into the same data type, but depending on
which one you choose they will or will not be considered equal.

The second most important data type in MUMPS is the number, which may or
may not include decimal positions or a negative sign, and may or may not
include leading or trailing zeroes. For example, the number 2 can also
be represented as 2.0, 02, 0002, 002.000, --2, and so on. As these are
all numbers, they do not obey the rules of strings. In strings, every
character is significant, so that "2" and "2.0" are not the same string,
but with numbers the only question is whether they resolve to the same
canonical value, for example, do they all equal 2.

If you ask MUMPS whether 2=0002.000, the answer is true. If you ask
MUMPS whether 2="0002.000", the answer is false.

There's more at work here than just a difference between whether the
value is a string or a number. MUMPS is a predicate-oriented language,
the very opposite of an object-oriented language. In an OO language, the
noun coerces the verb, so that admitting a patient and admitting a
mistake are two different kinds of admitting. In predicate-oriented
languages, like MUMPS, the verb coerces the noun, so admitting a patient
and admitting a mistake become the same kind of operation on two
different things

A good example of this is arithmetic. Although 2 does not equal
"0002.000", 2-"0002.000" does indeed equal zero. Try it at the MUMPS prompt:

>WRITE 2="0002.000"
0
>WRITE 2-"0002.000"
0

How is this possible? Is MUMPS insane? If X-Y equals zero, isn't that
the very definition of equality?

Well, yes and no. It depends on what you mean by equality. There are at
least two very different kinds of equality. Numeric equality cares about
the numeric interpretation of the value. String equality cares about the
exact characters that make up the string, irrespective of
interpretation. If you consider these two different kinds of equality
and look back at the two examples I gave, you can see what's at work
here. The minus sign coerces its arguments to behave like numbers, in
which 2 does indeed equal 0002.000, so subtracting one from the other
results in zero. However, the equal sign coerces its arguments to behave
like strings, in which "0" is only one character long but "0002.000" is
eight characters long, so obviously they are not the same string—they
aren't even the same length, let alone made up of the same characters in
the same order.

This predicate-oriented quality of MUMPS is one of the keys to
understanding MUMPS's revolutionary design. New MUMPS students with no
programming background usually grasp this idea quite readily, but
students with past programming experience in other languages often
struggle with this idea because it is so different from how most other
programming languages behave. It represents a deep philosophical divide
between MUMPS and mainstream languages that directly reflects MUMPS's
origins in manipulating medical data, in which the numbers you want are
often buried within text, like dictations. In MUMPS the data is
malleable and you use the commands, functions, and even the operators to
mold the data from type to type as you need until you get it into the
form you want for storage.

This brings us back to your question, finally with the background you
need to understand it.

Numbers and strings are two different types of data they have their own
logical collation (sorting) algorithms that you already know but are
unconscious of in your everyday life. You expect A to sort before AB to
sort before B, but you expect 1 to sort before 2 before 12. These are
contradictory sorting algorithms. If strings sorted like numbers, the
short ones would come before the longer ones, and if numbers sorted like
strings, they would be sorted first base don their first character, then
on their second, and so on. To resolve this problem, MUMPS sorts the two
data types separately. If you mix them in a subscript, as we usually do,
the numbers will all collate first in numeric order, and then the
strings will collate after in string order.

Storing a value in a subscript is an operation, a verb, so it coerces
its data. Its coercion is comparatively gentle. It is mainly concerned
with figuring out whether a given value is a string or number so it
knows which part of the collation to put the value into. However, it is
the one place in MUMPS where numbers take precedence over strings. That
is, if a value CAN be interpreted as a canonic number without changing
its form, it WILL be interpreted as a number. However, if the only way
to interpret it as a canonic number would be to change its form, then it
will be interpreted as a string and sorted with the strings.

The issue of canonic form is simple. You can represent the number 2 in a
wide variety of ways, but since the MUMPS equality is string based, it
is very very very important that when MUMPS hands you values back, such
as the results of arithmetic, he always uses the same form to represent
the same value. If adding 2+2 sometimes gave you 4 and sometimes gave
you "004.000", you would end up with values that were not equal in
MUMPS's eyes. Therefore, even though you are allowed to use a wide range
of representations for your numbers, MUMPS will always choose one—the
canonic form—to use when handing you back numbers. The simplest
explanation of the rules for the canonic form of numbers is that you get
rid of all the extraneous zeroes, decimal places and signs until you get
the simplest form of the number. So, 0002.000 is not canonic, 2 is.
Likewise ---3.5800 is not canonic, but -3.58 is.

When you look at "2.1" and "2.0" with this understanding, you can see
that 2.1 is in its simplest, canonic form, but "2.0" is not—it would be
a 2 in its canonic form. therefore, MUMPS can tell that "2.1" can safely
be coerced into a number for storage in a subscript, but "2.0" cannot
without losing some characters, so 2.1 ends up with the numbers and
"2.0" with the strings.

This behavior is rigorously defined in the MUMPS standard. It is
precisely correct in MUMPS terms. Like playing chess, our job is not to
say that chess is wrong because it isn't checkers, but instead to master
the rules and then use them correctly to get the behavior we want. You
can always get values to sort together and according to the collation
algorithm you want if you ensure they are of the same data type. "2.0"
and "2.1" are not. The fault here lies not with MUMPS but with the
creator of this index, who was not concerned enough with getting these
values to collate in the order we expect, or else did not understand the
behavior of MUMPS data types well enough to know what storing the values
in an index would do to those values.

You can recreate this very Mumpsy behavior easily at the programmer
prompt. Simply set X("2.0")=1 and set X("2.1")=2, then ZWRITE X.

Yours truly,
Rick

PS: It is issues like this that lead me to believe that when we repeat
the old chestnut that MUMPS only has one data type, the string, we are
doing new Mumpsters a huge disservice. This behavior makes perfect sense
according to the explicit rules of data type transformation given in the
standard, but no sense at all if there is truly only one data type. I am
indebted to Thomas Salander for writing an article in MUG Quarterly many
moons ago that first exposed me to the understanding that MUMPS has many
data types, and that mastering MUMPS requires mastering those data types
and their behavior. MUMPS's design is carefully constructed to present
the veneer of a trivial, simple language, but its true behavior is quite
sophisticated.

PPS: Our Paideia program includes MUMPS training that goes through the
MUMPS data types and rules for transformation in detail. I'm sure the
other top-notch MUMPS trainers like Greg Kreis, Susan Schluederberg, and
others cover these issues at least as well as we do. If you bypass
formal MUMPS training, you have to work pretty hard to make sure you
understand all of the MUMPS Standard. Everything is in there, but it is
not designed for teaching, only for specifying the behavior rigorously..

Frederick D. S. Marshall

unread,
Oct 28, 2008, 1:51:45 PM10/28/08
to Hard...@googlegroups.com
Dear Kevin,

MUMPS is a strongly-typed language. MUMPS always knows exactly which
data type it is using. It is however a dynamically typed language. This
creates a far more flexible environment, but puts the burden on the
programmer to understand the rules for data-type transformation.

Yours truly,
Rick

Frederick D. S. Marshall

unread,
Oct 28, 2008, 2:54:49 PM10/28/08
to Hard...@googlegroups.com
Dear Kevin,

Part of the challenge of being a great programmer is escaping the
paradigmatic blinders of the first programming language we fall in love
with. The same things that make one language great usually define its
limitations as well, and more importantly, after we fall in love with
our first language we lose the ability to see new languages clearly, to
appreciate their strengths and weaknesses in their own terms. It becomes
too easy to read all other languages in terms of how it does or does not
recreate the experience of working in the first one we loved.

For me, it was Modula-2 and Smalltalk I fell in love with. I went
through a period of rewriting Task Manager to try to make it early-bound
and strongly declarative, and the result was to crash live VA hospital
systems with the terrible performance that resulted. Sometimes it takes
a lesson that painful to understand the ways in which our expectations
become a Procrustean bed we impose on the world around us. Indeed, our
very favorite things are the very worst in this regard, holding far too
strong a hold over how we see the world, so that instead of meeting the
world on its own terms we insist on it meeting ours, an obligation the
world is simply not under.

Modula-2, the sequel to Pascal is a great language, and I loved working
in it, but it has weaknesses that only become apparent when you learn to
look at it from the perspective of programming languages that are very
very different from it. The more languages you can come to terms with in
their own terms, the more limber your programming mind will become, so
that instead of becoming rigid formatting each new language's
architecture becomes a game, like learning to be just as good at Go as
at Chess. I have reluctantly come around to the conclusion that the Game
of Kings is not as elegant a game as Go is, but since I started with
Chess it took me decades to give up my way of measuring all game in
terms of Chess.

The same is true here. MUMPS is not Pascal, but neither is Pascal MUMPS.
When you become fluent in MUMPS, not just capable, you will begin to
appreciate the things that are very easy to do in MUMPS and extremely
difficult in Pascal. Pascal's strengths and weaknesses are deeply
related, because both emerge naturally from its way of organizing
information and algorithms, but when you think within that model you can
only see its strengths, not its weaknesses. That is how all cultures
work. The more immersed in them you are, the less clearly you can see
them, because you are using them to see with.

If you grow up with Chess and then try to move the stones in Go and are
told it won't let you, your first reaction may well be to complain that
the game is defective, that you have to be able to move the pieces or
else how are you supposed to play, let alone win. But in Go, it is a
misuse of the stone to move it, and it destroys the powerful paradigm of
consequences that Go has that Chess lacks (relatively). If you make a
bad Chess move, you can often undo it on your next move, but in Go a
piece placed is placed forever. Every small move you make has permanent
repercussions that define the shape of the game far more explicitly than
in Chess, which is why Chess games resemble each other (indeed fall into
easily identifiable and named patterns about which many books are
written) far more readily than Go games do. Go is an entirely different
way of thinking about boards and pieces than Chess.

MUMPS is to Pascal just like this. Pascal embodies a
material-engineering analogy, in which things are what they are and are
not what they are not. Working in Pascal feels a lot like elegant
physical engineering, and it is able to do wonderful things with this
metaphor.

But the truth of programming is far more encompassing than materials
engineering, and there are many ways in which materials engineering is a
terrible analogy for data and algorithms. MUMPS takes the near-opposite
approach, in which both data and algorithms can be fluid and shifting.
What can you do with fire and water and air that you cannot do with
stone and metal and glass? You will never find out so long as you try to
treat flux as though it is stasis.

If you restrict yourself to the uses and features of MUMPS that most
resemble Pascal, then MUMPS is indeed just a poor man's Pascal for you.
But on its own terms, MUMPS is neither a good Pascal nor a bad Pascal—it
is not any kind of Pascal. If you really bend your mind to take MUMPS on
its own terms, to master what will at first seem like idiosyncrasies
(like why 3+4*5 equals 35 instead of 23 like it is "supposed to") a new
metaphor will slowly take root in your mind, reluctantly, against the
pressure of your expectations, and coding and data will begin to feel
like something very different than they do with Pascal. You will begin
to use the parts of MUMPS that currently feel wrong in powerfully fluid
ways Pascal simply cannot do elegantly.

There are a number of languages like Pascal from which you can learn its
algorithmic and structural metaphors, but there are very few languages
like MUMPS from which you can learn its metaphors. If you're willing to
break your mind on it, its very weirdness gives you a great opportunity
to break free. To become a truly great programmer, you need to escape
the hold that your favorite programming paradigm has over you and master
the others until you can flexibly shift paradigms to adapt your approach
to programming as closely as possible to each individual problem, doing
full justice to the problem by making your solutions fit the problems
rather than insisting that each problem fit the same solution over and over.

If it seems like I'm beating a dead horse or beating up on you, it's
because I'm excited that you have put your finger on three of the most
important things about MUMPS: (1) that its behavior is weird, defies our
expectations; (2) that its behavior makes sense in its own terms, but
first you have to understand what those terms are; and (3) that the best
possible result of this clash between Pascal and MUMPS is not that
either of them wins in your mind but that you use them against each
other to get the leverage you need to begin to think about algorithms
and data not from one perfect perspective or paradigm (which doesn't
exist) but from multiple perspectives and paradigms, freeing yourself
from the tyranny of the paradigm about which Thomas Kuhn wrote in The
Structure of Scientific Revolutions. You want to become that extremely
rare breed of scientist who doesn't have to die out for his field of
study to advance. Almost no programmers, engineers, or scientists ever
achieve this.

Yours truly,
Rick

Steven McPhelan

unread,
Oct 28, 2008, 3:04:39 PM10/28/08
to Hard...@googlegroups.com
Rick, I am a little confused in that you state that there are different data types in M.  Specifically, you stated that in M 0002.000 is not the same thing as "0002.000".  That is understandable.  What I do not understand is how would one actually use 0002.000 without the quotes.  How can one set a local variable to a numeric data type with a value of 0002.000?  Your explanation seems to indicate that one would actually be able to use and see the value of a numeric as 0002.000.  One way I can think of doing this is something like:
   S V=2 I V'=0002.000  But that is hard-coded and what use is such a coding style.

--
Steve
"People who are brutally honest get more satisfaction out of the brutality than out of the honesty." -- Peter F. Drucker

Greg Woodhouse

unread,
Oct 28, 2008, 3:05:28 PM10/28/08
to Hard...@googlegroups.com
You've hit the nail on the head here. Except... except that we can't really say that MUMPS is strongly typed if the typing rules aren't spelled out in the standard. I agree with you that for practical purposes MUMPS behaves like dynamically typed language, but the implementation can't be the standard.

Greg Woodhouse

unread,
Oct 28, 2008, 3:24:52 PM10/28/08
to Hard...@googlegroups.com
This reminds me of a debate I got into on another list, and I think I was on the wrong side of the argument there. One possible approach would be to say that 0002.00 is one of infinitely many external representations for the value 2, and is not itself a value. Another option is to say that this is a string that implicitly evaluated to a numeric value in certain contexts. That is troubling because the semantic action of converting the string to a number is driven by syntax in context dependent fashion. I was basically arguing in favor of this silent conversion of representations into values, but certainly agree that it is semantically messy.
 
This is really a tough issue to deal with formally. As Rick has pointed out, MUMPS really does treat strings and numbers (probably characters, too) as different data types, but none of this is made explicit in the language standard. Treating canonic numbers as convertible to values in 1-1 fashion is probably the most proming and truest to the standard, but it makes the semantics of the language difficult to specify. Worse, it undercuts an observable feature of all MUMPS implementations, and I think one that is worth preserving: that it is strongly (by which I mean completely and disjointly) typed - a question that really needs to be separated from the question of whether it is dynamically or statically typed.

Jim Self

unread,
Oct 28, 2008, 3:28:12 PM10/28/08
to Hard...@googlegroups.com
Greg Woodhouse wrote:
You've hit the nail on the head here. Except... except that we can't really say that MUMPS is strongly typed if the typing rules aren't spelled out in the standard.
...except that they are!


I agree with you that for practical purposes MUMPS behaves like dynamically typed language, but the implementation can't be the standard.

Implementation differences only come into play when you exceed the bounds of the standard and extra-standard compatibility. You know that!

Frederick D. S. Marshall

unread,
Oct 28, 2008, 3:54:00 PM10/28/08
to Hard...@googlegroups.com
Dear Greg,

The typing rules are spelled out in the standard, very clearly. But
elsewhere, the standard has an old throw-away line that says MUMPS has
one data type: the string. That sentence will be removed from the next
version of the MUMPS standard.

The opening of section 7.1.4 through 7.1.4.7 explains very explicitly
the four primary data types and how they are coerced one into another.
That operators coerce their arguments to be the data type they want is
explained everywhere in the standard. The metalanguage element V means
interpret the result as the following data type, and it's used
everywhere to show what MUMPS will do with its data.

For the false, contradictory statement that there is only one data type
in MUMPS, the standard contains two such statements. Check glossary
entry 4.65: Type:

"4.65 type: M recognizes only one data type, the string of variable
length. Arithmetic operations interpret strings as numbers, and logical
operations further interpret the numbers as true or false. See also
truthvalue."

Also, in Section II: Portability, article 2.5: Data Types:

"2.5 Data Types

"The M Language Specification defines a single data type, namely,
variable length character strings. Contexts which demand a numeric,
integer, or truth value interpretation are satisfied by unambiguous
rules for mapping a string datum into a number, integer, or truth value."

This captures the contradiction in the standard in a nutshell. The words
"interpret" and "mapping" here seems to imply that the data is still a
string, it's "just" being interpreted as other things, but when MUMPS
"interprets" all Hell breaks loose, so there's no "just" about it. It's
far more true to say any MUMPS data type can be interpreted as a string.

Thomas Salander's very good explanation of this may be found in the
column "MUMPS Language Issues, in the article "Datatypes: Strong, Weak,
and Imaginary" from MUG Quarterly Volume XX Number 2, from August 1990,
which opens like this:

"MUMPS is typeless." "We have only one data type." "Everything is a string."

"Depending on your point of view, these are all true statements. MUMPS
is called typeless because there are no type declarations. The explicit
data type for data storage is the variable length character string. Any
variable can be assigned any literal value.

"But viewing MUMPS as "typeless" may lead to problems. In face, MUMPS
has many data types, although most of them are implicit. . . ."

And so on. It's really a great article, and if Thomas grants me
permission I will bring it back into print for you all.

Reading the standard, it spends a great deal of effort describing many
of the types and how they coerce one into another, and then it includes
these two seemingly authoritative statements that are so often quoted,
but they are like the Wizard of Oz, with no real weight behind them,
just empty, bald declarations, a trap for people who prefer
black-and-white clarity. Thomas's article lists far more than even the
four "interpretations" explicitly cited in the standard, all of which
are borne out by the metalanguage, and I have identified a few more
relating to indirection. We are working on an update to the 1995
Standard M Pocket Guide which will include a table of all the data
types, where they are used in MUMPS, and how they coerce from one to
another.

So yes, it is strongly typed, and the rules are very clear. You just
have to ignore some remarkably coherent noise left over in those two
contradictory statements. The closest I can come to endorsing the
statement that MUMPS has only one data type: the string, is that in
MUMPS the string is the universal data type, the lingua franca, in which
all its other data types can be expressed. We can go a certain way in
learning MUMPS with just the string, but until we learn numbers,
integers, and truth values we won't really be a good MUMPS programmer,
and until we at least intuitively grasp all the others—names,
namevalues, labels, line references, absolute line references, routine
names, entry references, label references, local variable names, local
array references, global variable names, global array references,
subscripts, patterns, MUMPS code, and the rest, we will not have truly
mastered MUMPS and will be unable to make the language really sing at
its full potential.

Yours truly,
Rick

PS: That MUMPS is strongly typed but not statically typed nor
declaratively typed is part of what makes many people's heads hurt. And
that's a good thing! When we are in our comfort zone we can't learn
anything profound, according to Aeschylus.

PPS: It will help this discussion to bring the standard back into print,
which should be in the next month or two. Watch for an announcement on
hardhats about MUMPS Books, our new publishing arm.

Greg Woodhouse wrote:
> You've hit the nail on the head here. Except... except that we can't
> really say that MUMPS is strongly typed if the typing rules aren't
> spelled out in the standard. I agree with you that for practical
> purposes MUMPS behaves like dynamically typed language, but the
> implementation can't be the standard.
>
> On Tue, Oct 28, 2008 at 10:51 AM, Frederick D. S. Marshall
> <rick.m...@vistaexpertise.net
> <mailto:rick.m...@vistaexpertise.net>> wrote:
>
>
> Dear Kevin,
>
> MUMPS is a strongly-typed language. MUMPS always knows exactly which
> data type it is using. It is however a dynamically typed language.
> This
> creates a far more flexible environment, but puts the burden on the
> programmer to understand the rules for data-type transformation.
>
> Yours truly,
> Rick
>
> kdt...@gmail.com <mailto:kdt...@gmail.com> wrote:
> > Thanks everyone.
> >
> > This is an example of having a "high" level language that tries
> to do
> > things in the background for you to make your life easier. Only
> > sometimes they make your life harder. I love strongly typed
> languages
> > like pascal etc.
> >
> > But thanks for the good responses.
> >
> > Kevin
> >
> >
> > On Oct 27, 12:11 pm, Jim Self <jas...@dcn.davis.ca.us

Frederick D. S. Marshall

unread,
Oct 28, 2008, 4:03:14 PM10/28/08
to Hard...@googlegroups.com
Dear Steve,

The difference between 0002.000 and "0002.000" shows up in code, where
the presence or absence of the quotes explicitly denotes your intention
for the value to be interpreted as a string or as a number. The main
places where values like "0002.000" come up is when importing them from
foreign system via HL7 interfaces, or reading them directly from
machines that encode their results in fixed numbers of decimal places,
or lifting them from blocks of text. In all such cases, any numeric
coercion of "0002.000" causes it to drop the quotes and then proceed
through numeric coercion, which yields the canonic value of 2.

That is, usually something like 0002.000 will be an intermediate
calculation or interface product rather than a result. If you care about
zeroes, for example if you are aligning figures in a table or recording
measurement precision, then you have to do the work to keep the value
interpreted as a string to prevent numeric interpretation from rendering
it in its canonic numeric form. Quotation marks, for example, will
preserve those zeroes when you save it as a variable value or as a
subscript value.

Interestingly, although with a value like 0002.000 you have the option
of collating it either as a number (2) or as a string ("0002.000"), with
a cacnonic number you do not have that option. There is no way to get a
number like 12 to collate with the strings, not even if you enclose it
in quotation marks (like this: "12"). The act of storing it in a
subscript forces numeric interpretation of any value, string or not,
whose form precisely matches a canonic number. Since the quotes are not
actually part of the value, "12" is rendered 12 and sorts as a number.
If you want to mix canonic number forms and noncanonic numbers and
strings together in a subscript and get them all to sort as strings, you
have to append something to them all (like a space) to ensure that none
of them can match the canonic form of a number.

This follows clearly from MUMPS's data-type coercion rules, but makes no
sense if all values in MUMPS are actually strings.

Yours truly,
Rick

Greg Woodhouse

unread,
Oct 28, 2008, 4:10:49 PM10/28/08
to Hard...@googlegroups.com
It's not the only such language: Scheme is an example of a dynamically typed language with strong typing. Then again, most people seem to prefer statically typed languages such as ML and Haskell as more modern. PLT Scheme ev3en has an explicitly typed dialect (still considered experimental)!
 
Your examples of how the standard is self-contradictory, associating values of different types with strings are spot on. I think a new standard (even one that introduced no new features) would serve us well (and would be good for VistA, too).

Frederick D. S. Marshall

unread,
Oct 28, 2008, 4:25:42 PM10/28/08
to Hard...@googlegroups.com
Dear Greg,

I agree. We are rapidly approaching the launch of the MUMPS Standards
Organization, one of the tasks for which we create the VISTA Expertise
Network as a spin-off of WorldVistA. The new standardization process we
have been designing for MUMPS is much faster and surer than the MDC's
approach, at least on paper. Next year we will get the opportunity to
find out for real. A very simple update to the standard to remove errors
and loosen the portability limits to catch up with the progress the
implementors have made over the last decade is very much our goal for
the first new MUMPS standard, and the new process is completely goal
directed to get away from the feature-creep problems we used to have.

Yours truly,
Rick

Greg Woodhouse wrote:
> It's not the only such language: Scheme is an example of a dynamically
> typed language with strong typing. Then again, most people seem to
> prefer statically typed languages such as ML and Haskell as more
> modern. PLT Scheme ev3en has an explicitly typed dialect (still
> considered experimental)!
>
> Your examples of how the standard is self-contradictory, associating
> values of different types with strings are spot on. I think a new
> standard (even one that introduced no new features) would serve us
> well (and would be good for VistA, too).
>
> On Tue, Oct 28, 2008 at 12:54 PM, Frederick D. S. Marshall
> <rick.m...@vistaexpertise.net
> <mailto:rick.m...@vistaexpertise.net>> wrote:
>
>
> Dear Greg,
>
> The typing rules are spelled out in the standard, very clearly. But
> elsewhere, the standard has an old throw-away line that says MUMPS has
> one data type: the string. That sentence will be removed from the next
> version of the MUMPS standard.
>
> The opening of section 7.1.4 through 7.1.4.7 <http://7.1.4.7/>
> > <mailto:rick.m...@vistaexpertise.net
> <mailto:rick.m...@vistaexpertise.net>>> wrote:
> >
> >
> > Dear Kevin,
> >
> > MUMPS is a strongly-typed language. MUMPS always knows
> exactly which
> > data type it is using. It is however a dynamically typed
> language.
> > This
> > creates a far more flexible environment, but puts the burden
> on the
> > programmer to understand the rules for data-type transformation.
> >
> > Yours truly,
> > Rick
> >
> > kdt...@gmail.com <mailto:kdt...@gmail.com>
> <mailto:kdt...@gmail.com <mailto:kdt...@gmail.com>> wrote:
> > > Thanks everyone.
> > >
> > > This is an example of having a "high" level language that
> tries
> > to do
> > > things in the background for you to make your life easier.
> Only
> > > sometimes they make your life harder. I love strongly typed
> > languages
> > > like pascal etc.
> > >
> > > But thanks for the good responses.
> > >
> > > Kevin
> > >
> > >
> > > On Oct 27, 12:11 pm, Jim Self <jas...@dcn.davis.ca.us
> <mailto:jas...@dcn.davis.ca.us>
> > <mailto:jas...@dcn.davis.ca.us

Steven McPhelan

unread,
Oct 28, 2008, 5:11:29 PM10/28/08
to Hard...@googlegroups.com
Part of what I was trying to understand is from my experience with Cache.  I just now tried to find this setting in Cache 2008.1 but wouldn't you know it I did not find how to set it.  At one time and I believe you still can, you can tell Cache whether arrays should sort as Cache Standard which is what you have been describing or to sort as strictly ASCII.  In the VA we have always used the standard sorting, numerics first, then strings.  I never actually configured a system using strict ASCII collation.  So I was trying to resolve what you were saying with that Cache feature.
 
I guess I was thinking about the data typing being done at the SET level and did not consider the dynamic feature of the data typing based upon context use.  In that case, you do have a multitude a M data types.  This "coerced" definition of data types seems to be a contradiction to the industry standard definition for "data type".  Does such an industry standard definition actually exist and if so does it allow for data typing to be based upon context versus declaration?  In other words, independent of the programming language, what does the industry mean when it uses the term data type?

Skip Ormsby

unread,
Oct 28, 2008, 5:24:18 PM10/28/08
to Hard...@googlegroups.com
I agree with you Rick, because my first language was assembler first on the IBM 1400 and then on the IBM 360/370 (at least these critters understood multiplication and division).  One of the first exercises I do when I encounter a new language was the first program I ever wrote and that is write out the squares from 1 to 100 using only addition and you can do 2+2, or 3+3, etc.  Oh and btw you can't use a For loop.
-skip

Skip Ormsby

unread,
Oct 28, 2008, 5:32:18 PM10/28/08
to Hard...@googlegroups.com
I say this with a tad trepidation but here goes anyhow - as I understand it to force zero(0) to sort before one(1) you can use the ']]' syntax.  It's used in several places within FileMan and one place where I really like it and I thank the FileMan team for adding it back in v21 (Rick and company), is when you are sorting on SSN, before v21 the SSNs that began with zeros came after those that start with nine.
-skip

Greg Woodhouse

unread,
Oct 28, 2008, 5:42:08 PM10/28/08
to Hard...@googlegroups.com
Can you use math?
 
(n+1)^2 - n^2 = 2n + 1

Ruben Safir

unread,
Oct 28, 2008, 6:19:17 PM10/28/08
to Hard...@googlegroups.com
On Tue, 2008-10-28 at 14:42 -0700, Greg Woodhouse wrote:
> Can you use math?
>
> (n+1)^2 - n^2 = 2n + 1

Yes the answer is 42

Ruben

Greg Woodhouse

unread,
Oct 28, 2008, 6:04:40 PM10/28/08
to Hard...@googlegroups.com
This is a good example of the kind of thing I had in mind. In evaluating expressions like "2A"+0 or "2A"_0 the meaning of the left-most operand is determined in part by type or the other operand and in part by operator. One way of formalizing this is using small-step operational semantics with type annotations, as in
 
x1:T1 + x2:T2 => ?
 
We can have a multitude of rules, say where T1 = number and T2 = string, or we can that the operator string->number be applied to any string operands, so x1 + x2 would reduce to
 
number->string (string->number(x1) + string->number(x2))
 
and so forth. It may well be that both techniques will lead to the same results, but they are different semantic styles, and an equivalence proof is called for. Do we really know that all "reasonable" readings of the prose in the MUMPS standard are operationally equivalent?

Frederick D. S. Marshall

unread,
Oct 28, 2008, 6:07:25 PM10/28/08
to Hard...@googlegroups.com
Dear Steve,

Before the 1984 standard MUMPS only accepted positive numeric
subscripts, so the numeric portion of the modern MUMPS subscript
collation actually predates the string, which is why (for backward
compatibility) the numeric interpretation takes precedence over the
string when deciding how to interpret each piece of data going into a
subscript.

It's easiest to resolve these conflicts usually just by changing our
algorithm.

Task Manager had a problem for years because $HOROLOG's value is a
string, but when we look at it we see two numbers separated by a comma
and expect it to collate numerically. Since most of the day $H is five
numeric digits a comma and five more digits, the differences between
numeric and string collation didn't come up, but early in the morning
when the number of digits for seconds since midnight was less than five
digits, collation "errors" began cropping up, with tasks running "out of
order". Wally's solution, which was the right one, was to abandon using
$H as a subscript because we do not want string collation for time.
Instead, he converts it to seconds, a bug number, which collates
strictly numerically and hence keeps tasks in chronological order all
the time.

If we want KIDS version numbers, which are a mix of numbers (2.1) and
strings (2.0), to collate together and sequentially, we need to get rid
of the mix by imposing a uniform type, and either will do. If we want
strings, we need to concatenate something onto them all to make none of
them a canonic number form, and then we will have to do a transform when
comparing lookup values with subscript values to make them match again.
If we switch to strictly numeric, by plussing each value to force
numeric interpretation, we also escape the collation problem, but now we
still have to do transforms to compare a subscript value like 2 with the
original value like "2.0". That's probably the right solution, since
then only some of the values have to be transformed, whereas in the
string approach they all have to be transformed to make the effect of
the transforms on the collation zero out.

The industry believes it has a definition of data types because it is a
big herd in which everyone has the same family scent, doing what seems
like the same thing, which therefore becomes right in the way crowds
tend to find their own crowd behavior self-validating. They have also
constructed a lot of wonderful rationalizations that pass for theories
to prove they are doing the right thing. However, the truth is none of
us is doing the right thing, and the way we know is that we have not
solved the software crisis. Whoever actually gets to do so will get the
bragging rights of being proven by reality to be right, but until then
the rest of us are constructing plausible theories like toolkits we use
to help us solve limited problems. Since the majority prefers static
languages, their data type models are also static and tend to work best
when used in static ways, but the more dynamism is involved in the
algorithm or the data, the more clunky these static theories of data
types tend to behave, and the more they fall apart in practice.

I am in no way saying that static data typing is wrong, only that it
works better for some things than others, just as dynamic data typing
works better for some things than others. The best toolkit includes both
kinds of data typing, which is what we wanted to do with the
object-oriented extensions to MUMPS, to give Mumpsters the option of
whether to use dynamic, operation-driven typing or to use more static,
object-oriented typing case by case, problem by problem (the Omega guys
will remember our principle of Fire and Ice, which was about this).

As a side note, different parts of the industry also use the term data
type to mean different things, even though they tend to agree about the
static part. Sometimes it has a lot to do with the kinds of interfaces
and contracts the data will honor, sometimes more with the internal
structure of the data, and sometimes about what rela-world entity is
being modeled regardless of the implementation. The term is most useful
when we understand tha variety of ways it can be used and don't try to
apply too much rigor to it, because if we do we find it falls apart into
several related but quite distinct ideas for which we currently have but
a muddle of language to try to grasp.

Yours truly,
Rick

Skip Ormsby

unread,
Oct 28, 2008, 6:18:56 PM10/28/08
to Hard...@googlegroups.com
The answer is not 42 and no Greg you are using exponentiation (a form of multiplication).  You and only use addition and subtraction, remember the IBM 1400s did not have the internal code to do multiplication and division.
-skip

Skip Ormsby

unread,
Oct 28, 2008, 6:19:48 PM10/28/08
to Hard...@googlegroups.com
Hint
1 = 1
2 = 4
3 = 9
4 = 16
5 = 25

Play with those numbers.
-skip

Jim Self

unread,
Oct 28, 2008, 6:28:31 PM10/28/08
to Hard...@googlegroups.com
Skip Ormsby wrote:
One of the first exercises I do when I encounter a new language was the first program I ever wrote and that is write out the squares from 1 to 100 using only addition and you can do 2+2, or 3+3, etc.  Oh and btw you can't use a For loop.
And no GOTO's either! ;)

Frederick D. S. Marshall

unread,
Oct 28, 2008, 6:28:43 PM10/28/08
to Hard...@googlegroups.com
Dear Greg,

Interestingly, as Thomas points out in his article, you can go pretty
far with MUMPS believing that the string is the only type and that the
others are all mappings or interpretations, and as long as you can
compartmentalize in your brain effectively, you can properly implement a
MUMPS system or code in MUMPS without ever fully acknowledging the
multiplicity of data types in MUMPS. So, technically, both readings are
correct.

The problem is that (healthy) human beings do not compartmentalize as
well as machines, so having it be both true and not true that the string
is the only data type causes no end of confusion among MUMPS students,
which is why so many of them end up getting to be capable but not fully
competent with MUMPS. MUMPS's data type behavior makes far more
intuitive sense, i.e., is more humanly comprehensible, learnable, and
predictable, only when interpreted as multiple data types, even though a
MUMPS implementor can get the correct behavior out of his implementation
without that interpretation just by strictly following the rules of the
standard.

Certainly as far as possible readings go the 1995 MUMPS standard is not
perfect. It has far more rigor than most language standard documents, a
state of affairs that is both surprising to people who assume MUMPS is
backward (the MDC has after all been standardizing MUMPS for longer than
most other language-standards bodies, and has made and learned from a
lot more mistakes accordingly) and a state of affairs that is appalling
to those of us who know that fully half the proposals for the millennium
MUMPS standard were error corrections or the removal of ambiguity. Error
handling in particular is very easy to misunderstand as currently
written in the standard, which is why it took me a year to come up with
a clear, straightforward explanation of how it works for my Paideia
class (who might not entirely agree with my characterization of my
explanation as clear and straightforward), and why one of the more
important error-correction proposals for the millennium standard was
written by David Marcus as a result of trying to correctly implement
error processing as written in the standard.

Still, the errors in the standard are often very subtle ($REFERENCE
excepted) and rarely if ever touch on the data-type issue, which was
laid down long ago and has had a lot of time for vetting and refinement.
It tends to be the newer stuff that is more raw.

If you're interested, when we get into the process of cutting the next
MUMPS standard, but after we have a chance to apply all the existing
proposals that repair defects in the standard, I would love to have your
mathematically precise eyes go over the results and help us clear out
any additional remaining ambiguities. To do so now would just be a waste
of your time, since we already have a great pile of fixes to apply (and
some of them might even introduce new mistakes for you to find).

Yours truly,
Rick

Greg Woodhouse

unread,
Oct 28, 2008, 6:33:38 PM10/28/08
to Hard...@googlegroups.com
How about
 
#lang scheme
(define (count-by-squares n)
  (letrec
      ((cbs
        (lambda (m n)
          (unless (> m n)
            (printf "~a~n" (* m m))
            (cbs (+ 1 m) n)))))
    (cbs 1 n)))
(count-by-squares 10)

kdt...@gmail.com

unread,
Oct 28, 2008, 6:38:06 PM10/28/08
to Hardhats
Rick,

What I mean is that in pascal, the conversions is explicit, not
dynamic as mumps does. For example, I appreciate your prior comments
that 2.1 is different from 2.0. But if I have a function like below

DoSet(i)
set x(i)=""
quit

I can't be sure what I am getting. What is "i"? Is it a number? or a
string? Sure mumps knows, but I don't.

Again, we need to learn the chess game for what it is an play it. I
was just complaining a bit. Trust me, when I am working with pascal,
I wish for mumps globals!

But my immediate problem is that when I have the statement:
S ^TMG("KIDS",VER)="" etc, I need a way to force the different
versions of Ver to behave the same. I tried adding quotes around the
variable, and that doesn't work:

GTM>set ver="1.0" set x(ver)=""
GTM>set ver=2.0 set x(""""_ver_"""")=""
GTM>zwr x
x("""2""")=""
x("1.0")=""


And in the example below, how could I have forced 1.1 to remain "1.1"
and so sort correctly?

GTM>set a="abc*1.0*123",b="abc*1.1*123"
GTM>set ver1=$p(a,"*",2),ver2=$p(b,"*",2)
GTM>w ver1,!,ver2
1.0
1.1
GTM>set x(ver1)="",x(ver2)=""
GTM>zwr x
x(1.1)=""
x("1.0")=""

Kevin
p.s. I haven't read all your posts yet, perhaps you already answered
this...


On Oct 28, 1:51 pm, "Frederick D. S. Marshall"

Skip Ormsby

unread,
Oct 28, 2008, 6:58:13 PM10/28/08
to Hard...@googlegroups.com
Here is the M answer, sorry Jim about the GOTO, again the 1400's did not have instruction for FOR or BXLE(branch low or equal), or in another words you only had GOTO for looping:
         N NO,SQ,ODD
         S NO=1,SQ=1,ODD=1
A       I NO>100 Q
         W !,NO_" = "_SQ
         S NO=NO+1
         S ODD=ODD+2
         S SQ=SQ+ODD
         G A

And yes I could have compressed the code.
-skip

kdt...@gmail.com

unread,
Oct 28, 2008, 7:00:19 PM10/28/08
to Hardhats
On Oct 28, 4:03 pm, "Frederick D. S. Marshall"
<rick.marsh...@vistaexpertise.net> wrote:
> Dear Steve,
>
...
> If you want to mix canonic number forms and noncanonic numbers and
> strings together in a subscript and get them all to sort as strings, you
> have to append something to them all (like a space) to ensure that none
> of them can match the canonic form of a number.
>

Rick,
This answers the question to my prior question. But I still contend
that having to store all numbers with a trailing space is a awkward
hack that "shouldn't" be needed. I am not married to pascal or any of
the other languages that I use. I like mumps. I can't program in it
the amount of time I do every week without appreciating parts of it.
But let's face it, it's a simplistic language that doesn't have many
of the bells and whistles that more modern languages have. Bash
scripting language is also a language that follows internally
consistent rules, but it gives me heartburn every time I use it. :-)
But we have to take the tools that we have, and this is simply nature
of the beast.

Just to clarify, you said that the conversion from string to numeric
occurs at the point of setting the subscript value, correct? I just
want to make sure I have this correct. As below:

GTM>set a="abc*1.0*123",b="abc*1.1*123"
GTM>set ver1=$p(a,"*",2),ver2=$p(b,"*",2)
GTM>w ver1,!,ver2
1.0 <------ ver1 and ver2 are still strings, right?
1.1
GTM>set x(ver1)="" <---- here ver1 is kept as a string
GTM>set x(ver2)="" <---- here ver2 is USED as a number (but is
still a string)
GTM>zwr x
x(1.1)=""
x("1.0")=""

GTM>write ver1=1.0 <--- ver1 is still a string
0
GTM>

Thanks for the great responses.
Kevin

Greg Woodhouse

unread,
Oct 28, 2008, 7:08:34 PM10/28/08
to Hard...@googlegroups.com
Yes, he makes a compelling argument.
 
I think you and I are approaching the same point from different directions. What I was trying to illustrate is that what we've been calling coercion (a term I've been trying to avoid for reasons I may address in another post) can be formalized in different ways, and that it is a non-trivial to show these different approaches to formalizing the semantics of MUMPS are equivalent. No Iw, if I may be so bold, when you speak of mental "compartments" it sounds to be like you have in mind a particular semantic style, and that brings us back to the basic problem of demonstrating that our models of the language are, indeed, equivalent. Now, it might be argued that my proposed semantics based on type annotations and pattern matching is not the most natural reading of the MUMPS standard, but I believe it is at least a plausible one, and to someone accustomed to functional programming it might even seem a very natural one! And that's just it: lacking a formal semantics (even if there is an excellent natural language account of the language semantics) it is difficult to approach such questions as: Is the language description consistent? Is it complete? Is it categorical (i.e., are there distinct languages that meet the specification)? Do we care?
 
These are mathematical questions, but they have psychological analogues. If you and I, coming from different language backgrounds, independently sit down with the MUMPS standard and develop our own compilers, assuming that we make no mistakes, will be implement languages that are recognizably "the same"? In my opinion, MUMPS fares rather well by this test, but it has a long history that makes this a difficult experiment to carryr out, even in principle. Tradition has also played a real role in the evolution of MUMPS (and this is what I had in mind when I said implementations arw not a substitute for specifications). Still there will be people who come along from ddifferent traditions, and if they encounter an ambiguity, they may well resolve it in a way that "makes sense" to a Pascal programmer (assuming that is their background).

Ruben Safir

unread,
Oct 28, 2008, 7:52:43 PM10/28/08
to Hard...@googlegroups.com
On Tue, 2008-10-28 at 18:19 -0400, Skip Ormsby wrote:
> Hint
> 1 = 1
> 2 = 4
> 3 = 9
> 4 = 16
> 5 = 25
>

for $root (1..100){
for $iten (1..$root){
$tmp += root;
}
print "$root => $tmp";
tmp = 0;
}


untested psuedocode of course

BTW - the answer is 42...

Ruben

Greg Woodhouse

unread,
Oct 28, 2008, 7:25:54 PM10/28/08
to Hard...@googlegroups.com
You could translate the Scheme solution I gave into MUMPS, too. The basic idea is to replace the FOR loop with a recursive call, quitting when the starting index is not less than the upper index (10 in this case). I don't have a MUMPS system handy, but it would look something like this:
 
CBS(N,M) ;
 Q:N'<M
 W !,N*N
 D CBS(N+1,M)
 Q
 
You would then invoke it with D CBS(1,10).
 
An unusual feature of Scheme is that tail-recursive calls are required to run in constant space. To me, it is an open question whether such a requirement belongs in a language specification, but I'd expect this kind of optimization from a compiler.

Greg Woodhouse

unread,
Oct 28, 2008, 7:32:44 PM10/28/08
to Hard...@googlegroups.com
Perl?
 
I wonder what the worst possible solution might be (maybe I can cobble something together involving the Ackerman function). But I suppose something like this might be a start.
 
FOR I=1:1:100  D
.F J=1:1:I D
.I J*J=I W !,J
I'll leave it as an exercise to eliminate the FOR.

Jim Self

unread,
Oct 28, 2008, 7:37:20 PM10/28/08
to Hard...@googlegroups.com
Skip,
You missed the hint in Greg's math, even though you seem to know the answer in a different form.

sq(n+1) == sq(n)+n+n+1

or equivalently

sq(n) == sq(n-1)+n-1+n-1+1
    == sq(n-1)+n+n-1

In MUMPS (prohibiting FOR and GOTO) a general solution (not restricted to l00) could look like this:

sq(n) ;write squares up to n^2
    n sq
    if n'>1.8n!(n<1) s sq="Error: not a positive integer"
    e  if n=1 s sq=1
    e  s sq=$$sq(n-1)+n+n-1
    w !,n,?10,sq
    q sq


Skip Ormsby wrote:
Hint
1 = 1
2 = 4
3 = 9
4 = 16
5 = 25

Play with those numbers.
-skip
  


Skip Ormsby wrote:
The answer is not 42 and no Greg you are using exponentiation (a form of multiplication).  You and only use addition and subtraction, remember the IBM 1400s did not have the internal code to do multiplication and division.
-skip
  


Ruben Safir wrote:
On Tue, 2008-10-28 at 14:42 -0700, Greg Woodhouse wrote:
  
Can you use math?
 
(n+1)^2 - n^2 = 2n + 1
    
Yes the answer is 42
      


kdt...@gmail.com

unread,
Oct 28, 2008, 8:19:24 PM10/28/08
to Hardhats
This is off the topic of the thread... but....
OK, I'll bite. I could do it with just addition and loop, but I'm
puzzled how to do this without a goto or a jnz etc.

My introduction to assembly was on the 6502 processor. The hot thing
was that only most (i.e. not all) of 255 possible instructions bytes
were defined. But there was pcode associated with all of them. So
the video game writers would use these to confuse decompilers. And we
had fun decoding the games anyway.

Kevin

Frederick D. S. Marshall

unread,
Oct 28, 2008, 9:47:41 PM10/28/08
to Hard...@googlegroups.com
Dear Kevin,

Although making them all strings is an option, I think you actually want
the collation of numbers here, so it's better to make them all numbers.
Just plus them when you store them and you will force numeric coercion
even on values like "2.0". Now that means the index will store 2 instead
of "2.0", but it will collate before 12 instead of after "12.0", which I
suspect is what you want here.

You will still need to do conversions to use this index, to plus values
(+X) before you check for them in the index subscript, but the resulting
code should look just about as elegant as Pascal would be, with just the
addition of the +. As a side benefit, by plussing the values as you
stick them in the index, you will be back in the know about what their
data type is.

Yours truly,
Rick

Frederick D. S. Marshall

unread,
Oct 28, 2008, 10:17:54 PM10/28/08
to Hard...@googlegroups.com
Dear Greg,

I agree that we're approaching the same point, but our directions may
not be as different as at first seems to be the case. Getting a firmer
mathematical foundation for for the standard is a very very interesting
proposition to me. The earliest MUMPS standard included state-transition
diagrams to help make MUMPS's behavior even more rigorously specified,
and I would love to see something introduced to replace them. When we
were working on Omega, I experimented with an object-oriented
specification of MUMPS from the perspective of methods and properties of
the MUMPS virtual machine, and the initial experiments produced much
clearer results than the current specification.

Clearly then there is room for improvement. What do you have in mind?

Yours truly,
Rick

> <mailto:smcp...@alumni.uci.edu

kdt...@gmail.com

unread,
Oct 28, 2008, 11:34:42 PM10/28/08
to Hardhats
Thanks Rick,

Let me digest all this and decide which way I am going to go.

One quick question: When it comes to patch version numbers, some are
version 2, and some are 2.0. Is there a standard versioning number
pattern? I.e. should they all have a digit after the decimal? How
many digits? If I find a "2" should I make it into 2.0? Or should I
convert 2.0 --> 2?

These numbers that I am storing are not an index for a fileman file,
but a small custom storage array. I want to be able to match the
version numbers to versions that are already in the PACKAGE FILE.
This is the array that tells me how many patches are available for a
given package/version etc.

Thanks!

Kevin


On Oct 28, 9:47 pm, "Frederick D. S. Marshall"

Frederick D. S. Marshall

unread,
Oct 29, 2008, 12:04:29 AM10/29/08
to Hard...@googlegroups.com
Dear Kevin,

There are two standards, because the Patch module predates the KIDS
module and each was developed by a different programmer. The patch
module allows version numbers like 2, but KIDS requires at least one
decimal position like 2.0. So there are two version numbering patterns,
which means you get to choose.

The reason you should go with numeric is so that 1 sorts before 2 which
sorts before 12, instead of "1.0" sorting before "12.0" which sorts
before "2.0". You want numeric collation so that versions sort in the
order you expect. Small custom storage arrays collate exactly the same
as indexes on Fileman files, in this regard and all others.

When you're ready to match the version numbers to versions that are
already in the Package file, just append the ".0" back onto the end of
any version number that lacks a decimal point, like this:

I VERSION'["." S VERSION=VERSION_".0"

before you check the Package file.

chuck5566

unread,
Oct 29, 2008, 12:12:34 AM10/29/08
to Hard...@googlegroups.com
A little game some of us liked to play was to try to ascertain the computer language the person used before MUMPS by their complaints of MUMPS.  :-)

Jim Self

unread,
Oct 29, 2008, 12:32:11 AM10/29/08
to Hard...@googlegroups.com
kdt...@gmail.com wrote:
Rick,

What I mean is that in pascal, the conversions is explicit, not
dynamic as mumps does.  For example, I appreciate your prior comments
that 2.1 is different from 2.0.  But if I have a function like below

DoSet(i)
    set x(i)=""
    quit

I can't be sure what I am getting.  What is "i"?  Is it a number? or a
string?  Sure mumps knows, but I don't.
  

It is easy enough to find out, but if your application requires a different collation order than the standard one would give you on the unfiltered data, you must apply some sort of transform to map data values to subscript values. The same idea applies whether the data values are numbers or dates or names or spatial coordinates etc.


Again, we need to learn the chess game for what it is an play it.  I
was just complaining a bit.  Trust me, when I am working with pascal,
I wish for mumps globals!

But my immediate problem is that when I have the statement:
S ^TMG("KIDS",VER)="" etc, I need a way to force the different
versions of Ver to behave the same.

What is the correct ordering of version numbers? Is the decimal part a fraction or a secondary version number?

Does 1.11 follow 1.9?


  I tried adding quotes around the variable, and that doesn't work:
  

If applying a transform to achieve string based collation of numbers, you must pay attention to the decimal place.

If the correct ordering of version numbers treats them as simple numbers, then set x(+ver) would do nicely.

If not, then your transform must pad them with leading zeros or something to ensure that no subscripts are canonic numbers and to establish the correct character-by-character comparison needed for string based collation.


GTM>set ver="1.0" set x(ver)=""
GTM>set ver=2.0 set x(""""_ver_"""")=""
GTM>zwr x
x("""2""")=""
x("1.0")=""


And in the example below, how could I have forced 1.1 to remain "1.1"
and so sort correctly?

GTM>set a="abc*1.0*123",b="abc*1.1*123"
GTM>set ver1=$p(a,"*",2),ver2=$p(b,"*",2)
GTM>w ver1,!,ver2
1.0
1.1
GTM>set x(ver1)="",x(ver2)=""
GTM>zwr x
x(1.1)=""
x("1.0")=""
  


Frederick D. S. Marshall

unread,
Oct 29, 2008, 1:00:41 AM10/29/08
to Hard...@googlegroups.com
Dear Jim,

The decimal in VISTA version numbers is just a pure decimal, not a
counter, so 1.9 should follow 1.11, not the other way around (as section
numbers usually do).

Which raises an interesting point about numeric collation that just
occurred to me earlier today—the decimal portion of numbers collates in
string order, not numeric order. MUMPS's collation algorithms for
numbers honors this oddity of mathematics. I understand the reasons, but
it is still a curious result that collation on the left side of the
decimal point is different from collation on the right side.

Yours truly,
Rick

Holloway, Thomas (EDS)

unread,
Oct 29, 2008, 10:54:09 AM10/29/08
to Hard...@googlegroups.com
>"The answer is not 42 ..."
 
But Skip, 42 is the answer to everything, or to be more precise, the answer to life, the universe, and everything.
 

Thank you,
Thom H.  another HHGTTG fan

 


From: Hard...@googlegroups.com [mailto:Hard...@googlegroups.com] On Behalf Of Skip Ormsby
Sent: Tuesday, October 28, 2008 6:19 PM
To: Hard...@googlegroups.com
Subject: [Hardhats] Re: Mumps data typing, related to sorting...

Greg Woodhouse

unread,
Oct 29, 2008, 11:16:29 AM10/29/08
to Hard...@googlegroups.com
There's a complementary game, too: Based on the languages (other than MUMPS) that people like and choose to use in real situations, what appeals to them about MUMPS? This is one that often surprises me, because it at least looks like different MUMPSters find very different language features useful or appealing.
 
But I see your point.

Steven McPhelan

unread,
Oct 29, 2008, 11:27:14 AM10/29/08
to Hard...@googlegroups.com
Whether we want to fight this fight is one question.  My concern with the definitions used in M in relation to the rest of the industry (i.e., this supposed industry standard today) is the level of misunderstanding.  Since moving to the marketplace for trying to take VistA to the commercial marketplace, I have been come extremely sensitized to the definition of terms as is understood by most of the people knowledgeable in that area of interest.  For example, there are significant differences in the non-VA EMR world for definitions of such terms as option, visit, appointment, and admission.  I find myself having to translate the VistA definition of terms into a language syntax for which this non-VA listener is familiar.  In fact, I try to avoid using such terms in these discussions until I understand their definitions of such terms.  This is why I asked the questions about use of the term data type.  Like or not, M is still the little guy on the block.

Brian Lord

unread,
Oct 30, 2008, 10:45:16 AM10/30/08
to Hard...@googlegroups.com
Tom Ackerman was my friend and in my 7 or so years of knowing him, I
realized that Tom was missing some things that all of us take for
granted. Tom didn't seem to possess a frown, if he did it didn't come
out long, and even an argument with him ended with everyone smiling. He
was missing that piece that makes us forget or ignore about what is
important to others. He seemed to have an almost a mystical sense of
what was important to people and he focused on it.

He didn't have time to waste, when my 7 year old son showed him his pet
parakeet (Little), Tom insisted on showing him how to train him to do
tricks, and how to play with him properly. On countless times when I
said I was stumped on a problem he said well lets look at it together
right now. When we discussed talking to a third person about work, he
would instantly get that other person on the phone with us, (sometimes
regardless of how late it was).

He was missing the part that said work was supposed to be boring, for
him it was always exciting and fun. He was missing the part of people
that keep them from telling jokes you know no one will laugh at, but he
told them anyway, and we always laughed.

He wasn't missing a couple things. He wasn't missing friends, he wasn't
missing a wonderful family that it was clear he loved very much, he
wasn't missing a keen mind. He certainly wasn't missing respect which he
had in abundance.

Now however I find that I miss him and his friendship.

Tom's service was very moving, I know all on this list that knew him
would have been there if they could.

Rest in Peace Tom


Greg Woodhouse

unread,
Oct 30, 2008, 11:04:37 AM10/30/08
to Hard...@googlegroups.com
That's very nice.
 
I'm sure that if it were not for Hardhats, there are many of us who would have missed out on knowing Tom at all. We are all very fortunate, as is everyone who benefits from the contributions, including the intangibles you so eloquently describe, he has made.
 
I will miss him.

glilly

unread,
Oct 30, 2008, 11:21:27 AM10/30/08
to Hardhats
GTM>F I=42:0:42 Q:$L(I)>10 I ((((I+1)*(I+1))-(I*I))=((I*2)+1)) W I,!
qed

On Oct 28, 6:19 pm, Ruben Safir <ru...@mrbrklyn.com> wrote:
> On Tue, 2008-10-28 at 14:42 -0700, Greg Woodhouse wrote:
> > Can you use math?
>
> > (n+1)^2 - n^2 = 2n + 1
>
> Yes the answer is 42
>
> Ruben
>
>
>
> > On Tue, Oct 28, 2008 at 2:24 PM, Skip Ormsby <skip.orm...@gmail.com>
> ...
>
> read more »

Frederick D. S. Marshall

unread,
Oct 30, 2008, 12:31:26 PM10/30/08
to Hard...@googlegroups.com
Dear Brian,

Thank you. That's the Tom I knew and loved. For days I have been looking
for the words to express my love for Tom, and here they are. Thank you.

Yours truly,
Rick

Branden Tanga

unread,
Oct 30, 2008, 5:59:34 PM10/30/08
to Hardhats
I'll bite =)

While I have gotten over most of my complaints about mumps, I did have
a few "they did what!?!?" moments when learning mumps and VistA
simultaneously.

My initial complaints of mumps:
1. There are no reserved words
2. Commands are allowed to be shortened to one letter
3. There is no character that means End of Sentence
4. Goto's still exist
5. Indirection is cool in theory, but horrible to debug in practice

So what language do you think I started on? =)


Branden Tanga
> >>>>>> GTM>w 2.0...
>
> read more »

kdt...@gmail.com

unread,
Nov 1, 2008, 9:13:28 AM11/1/08
to Hardhats
Thanks Rick,
Kevin


On Oct 29, 12:04 am, "Frederick D. S. Marshall"

David Whitten

unread,
Nov 3, 2008, 12:17:53 PM11/3/08
to Hard...@googlegroups.com
expecting reserved words --> implies a compiled language, probably one
after the early 1970s because that is when compiler technology started
requiring reserved words because it made creating language parsers
easier. - implies not Fortran, not Cobol

surprised commands are allowed to be shortened to one letter ->
language developed after computer memory is cheap and disks are large
(late 1980s) , or a language that is Algol/Pascal syntax based,
because they thought spelling commands out would make it easier for
the language to be taught.

no character meaning End of Sentence -> therefore a "sentence" is
something used by the language. This is harder because it is not
definitive to say what a sentence is, most languages don't talk about
sentences, but do talk about statements like if-then-else-endif or
case-esac or try-finally. probably this is an artifact of a language
wanting to use an LR or LALR parser, again, meaning that a tool like
PCCTS/ANTLR or LEX/FLEX or YACC/BISON was originally used to define
the language syntax, all of which became more common in the 1980's and
1990s.

Gotos still exist -> rules out Pascal since it does have a goto, I
don't know if Delphi supports goto, but the Pascal language standard
does allow it. The attitude of "Gotos are considered harmful" was
developed in the late 1970s and early 1980s so your pre-MUMPS language
again was developed after then.

Indirection is cool in theory but hard to debug in practice -> implies
that your early language was probably a scripting language like Perl,
Python, or Ruby (or maybe Scheme or Lisp) rather than a heavily
compiled language like C or Ada since the scripting langauges allow
for dynamism, but not using dynamic values as input to many language
elements, (ie: they generally like the idea of evaluating code from a
string or data structure, but not a partially dynamic value such as
the MUMPS idea that it is dynamic but is still restricted to be used
in a limited way, such as as a SET argument, or a WRITE argument.

Hmm. I haven't been able to restrict it to one language yet.

Some differentiation questions:

What do you think about the ELSE command?

What is your reaction to MUMPS dotted lines as indention?

What about white-space usage in MUMPS ?

Do you code MUMPS horizontally or vertically?

What do you think about $SELECT ?

What do you think about post-conditionals?

If MUMPS added a syntax NAME.NAME would you expect
it to be allowed to be used in DO commands or SET commands or both?

What do you think of $ORDER and $DATA ?

On to the next round,

David

kdt...@gmail.com

unread,
Nov 3, 2008, 1:59:33 PM11/3/08
to Hardhats
Since discussing Mumps is always a fun thing, I'll jump in:

On Nov 3, 12:17 pm, "David Whitten" <whit...@worldvista.org> wrote:
...
> surprised commands are allowed to be shortened to one letter ->
I think I am the only one that uses the full word for commands instead
of the first letter. I think it is better for readability, but it
throws everyone else off.
..
...
> Gotos still exist ->
Many languages still have GOTO's, even though the use is discouraged.
I think that C/C++ do.

> Indirection is cool in theory but hard to debug in practice ->
I think of indirection as a pointer. And if used carefully, I think
they can be just as powerful. Why do you find them hard to debug?
Can't you set a breakpoint at the point of use and find out the actual
value?

...
...
> Some differentiation questions:
>
> What do you think about the ELSE command?

I wish that the else command could appear on the same line as the IF.
And it's a bit zany to require 2 spaces after it (and yes I understand
why)

> What is your reaction to MUMPS dotted lines as indention?
I like it fine.

> What about white-space usage in MUMPS ?
I really miss white space.

> Do you code MUMPS horizontally or vertically?
Vertically. I REALLY dislike finding a paragraph of code all on one
line. First, it is hard to read because the line is longer than the
screen. Second, I can't step through the code with my debugger in
small enough intervals. If 8 things are done on 1 line, then I can't
tell which is causing the bug.

> What do you think about $SELECT ?
I don't use is much. It's kind of like a CASE statement from other
languages, but it is more a horizontal command vs. a vertical code
structure.

> What do you think about post-conditionals?
I think they are only needed because an IF causes the rest of the line
to be skipped. I have started using them a little, but they are a
necessary evil :-)

> If MUMPS added a syntax NAME.NAME  would you expect
> it to be allowed to be used in DO commands or SET commands or both?
Both. I would like to see this. But I would also want it to be an
allowable variable for an XECUTE etc.

> What do you think of $ORDER and $DATA ?
I really like this. It is a strong tool that other languages can't
use
>


Kevin
Reply all
Reply to author
Forward
0 new messages