the missing value is represented by an empty string '' ! two quotes
without a space
I am using gfortran which has the isnan(x) to check for nan.
I hope there is a way to use a character representation of the NaN so
then
read(ch(1:2),*) myint,myreal
will give me int and real nans from the same string in ch(1:2)
thanks everybody.
> I am reading data with lots of missing values and need a good way to
> be a ble to assign NaN to integer, real and character vars.
There is no such thing as a NaN for integer or character. In theory, one
could define such a concept, but current hardware doesn't support it.
NaN as currently defined, is purely a concept for reals. I might add
that is an issue that transcends Fortran.
For things other than reals, you have to do something like keep track of
missing or invalid data in a separate flag. One could (and some people
do) reserve some "unlikely" value as one that you will consider to be
your personal convention for a NaN. But that is fraught with peril that
the "unlikely" value might turn out to be not quite as unlikely as you
anticipated; programs with such assumptions have had to be fixed in the
past, and probably will again in the future as people continue to make
the same kinds of mistakes.
For reals, see the IEEE_VALUE module procedure in f2003. There are
potential "issues" with signalling NaNs, but quiet ones should be ok.
> I am using gfortran which has the isnan(x) to check for nan.
Note that isnan is a nonstandard feature, although a fairly common one.
See the f2003 IEEE_IS_NAN, which I suspect gFortran also supports in
recent versions, although I'm not entirely sure.
> I hope there is a way to use a character representation of the NaN so
> then
>
> read(ch(1:2),*) myint,myreal
>
> will give me int and real nans from the same string in ch(1:2)
As I noted, there is no such thing as an integer NaN, so you are out of
luck for that. For reals, f2003 specifies that the characters NAN should
read as a NaN. (There are additional forms, but just NAN is the
simplest). That does not mean that NAN is a character NaN. It isn't (as
no such thing exists). It is a perfectly normal set of 3 characters. It
just happens to convert to a real NaN in formatted input. I'm not at all
sure whether gFortran yet supports that particular f2003 feature. And
note that the feature does require a minimum of 3 characters; 2 won't do
it.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
A good language would provide them, and handle them properly (unlike
IEEE 754 NaNs). That is also an issue that transcends Fortran.
However, if wishes were horses, beggars would ride. You have
described what is traditionally done, and the OP has little option.
Regards,
Nick Maclaren.
gfortran does not implement Sec 15, yet. It is one of the major
features missing for F2003 support. Working of this has been
on and off my TODO list for the last few years.
> > I hope there is a way to use a character representation of the NaN so
> > then
>
> > read(ch(1:2),*) myint,myreal
>
> > will give me int and real nans from the same string in ch(1:2)
>
> As I noted, there is no such thing as an integer NaN, so you are out of
> luck for that. For reals, f2003 specifies that the characters NAN should
> read as a NaN. (There are additional forms, but just NAN is the
> simplest). That does not mean that NAN is a character NaN. It isn't (as
> no such thing exists). It is a perfectly normal set of 3 characters. It
> just happens to convert to a real NaN in formatted input. I'm not at all
> sure whether gFortran yet supports that particular f2003 feature. And
> note that the feature does require a minimum of 3 characters; 2 won't do
> it.
gfortran 4.3.something and newer supports reading and writing NaN
and Inf.
--
steve
If (big heroic assumption!) you know something about the data you might then
be able to pick a sentinal (a.k.a. a magic value) to represent a missing value.
NaN is nice because somebody else did the thinking but one is not always lucky.
NaN is not always there or supported well even when it is there. As
noted above!
I have applications where negative values are not present so I can use
a negative
value. Of course it clutters up all the rest of things to keep watching for
missing. But even NaN does not solve that problem. When negatives are possible
then you are left with carrying an external flag. For some of my applications
it is easier to have negative be missing and carry an external sign on the
absolute value.
You will have to learn to live with some version of a suitable poison. Welcome
to the reality of dealing with missingness!
> I have a feeling this isn't what you want, but what you asked for is
> in
>
> https://svn.r-project.org/R/trunk/src/main/arithmetic.c
>
> in particular around lines 110 and 160.
> And for completeness, lines 1106-8 of names.c (there are 4 types of NA in R, and NA_character_ is in > a different file).
> --
> Brian D. Ripley, rip...@stats.ox.ac.uk
typedef union
{
double value;
unsigned int word[2];
} ieee_double;
static double R_ValueOfNA(void)
{
/* The gcc shipping with RedHat 9 gets this wrong without
* the volatile declaration. Thanks to Marc Schwartz. */
volatile ieee_double x;
x.word[hw] = 0x7ff00000;
x.word[lw] = 1954;
return x.value;
}
int R_IsNA(double x)
{
if (isnan(x)) {
ieee_double y;
y.value = x;
return (y.word[lw] == 1954);
}
return 0;
}
int R_IsNaN(double x)
{
if (isnan(x)) {
ieee_double y;
y.value = x;
return (y.word[lw] != 1954);
}
return 0;
}
I suppose this will slow things down if numbers are not just numbers
any more.
If the integers are default integers then one could use double
precision real instead of integer variables, because the number of
significant digits will be enough. IEEE-standard double precision does
offer NaN but even on a non-IEEE system any value greater than huge(1)
could be deemed to be a NaN. Of course one could then fall into the
trap I fell into many years ago with a Burroughs Algol program: wrong
answers because I had inadvertently declared a function with one of
integer and real, and called it with the other. (On that system
integers were treated as real numbers with zeros after the decimal
point.) Another possible disadvantage: f95 allows elemental intrinsic
functions of integer variables but not of real variables in
initialization expressions.
-- John Harper
The best way to deal with this kind of problem is to use a
logical scalar (or logical array that's "parallel"),
and which contains TRUE when the variable contains a good
value, and FALSE when the value is missing.
This is what DBMS such as Postgres seem to do internally - they appear
to have NULL as a valid data value, but in fact keep a bit array with
one bit for each field which when set means that the value is null. The
lack of a bit data type in Fortran makes this a bit harder, but given
the cheapness of memory nowadays it's hardly a problem to use LOGICAL.
I think that is the best way of doing it in Fortran, but I admit I have
sometimes cut corners. Instead of NaN I have sometimes used HUGE(0) for
integers and HUGE(1.0) for reals - these are valid values which, though
in band, are unlikely to arise in real data inadvertently, and are easy
to test for.
--
Clive Page
> On 8/17/10 8:52 AM, Richard Maine wrote:
> > For things other than reals...One could (and some people
> > do) reserve some "unlikely" value as one that you will consider to be
> > your personal convention for a NaN. But that is fraught with peril that
> > the "unlikely" value might turn out to be not quite as unlikely as you
> > anticipated; programs with such assumptions have had to be fixed in the
> > past, and probably will again in the future as people continue to make
> > the same kinds of mistakes.
> While you are absolutely correct in the strictest sense, in reality most
> FORTRAN applications are built to solve a specific type of problem. The
> programmer should have a good idea of the range of possible values they
> are dealing with and picking a personal NaN string is not very difficult
> and not really "fraught with peril". I've been using this approach in
> engineering applications for a very long time and have never had a problem.
I had over 40 years of experience in real engineering applications and I
have in my time debugged quite a lot of them with exactly that problem.
I have on occasion related some of the tales here.
For integers and characters, it is reasonably common for applications to
make use of every available bit pattern. That's particularly so for
8-bit and 16-bit integers, but happens for 32-bit ones as well. Integers
are reasonably often used as "bit containers". I discourage people from
using characters as such containers, but lots of people do it anyway
(and I've even seen it recommended here in preference to depending on
8-bit integers being supported). Since I was directing the above-quoted
paragraph specifically at "things other than reals" (note the
introductory words to the para), such uses seem very relevant. If your
experience is that picking a personal NaN integer or character is
"safe", then just have say that your experience is a lot different from
mine. That's particularly so if one writes general code that gets reused
instead of rewriting every application from scratch. Yes, that happens
in Fortran... a lot.
But back to reals. Most of the problems I've seen in this area have been
with reals, whether because reals tend to be central to lots of Fortran
data or because most people don't even try to define a personal "NaN"
integer or character.
People put the "NaN" flags in and then forget to check for them. After a
few arithmetic operations, they are no longer recognizable as NaN flags,
but are just junk results. Maybe you are careful enough to check every
time. If so, I'd say you were in a fortunate minority. I've found lots
of bugs in other people's code with that kind of practice. (And, well,
let's pretend it was always other people's. :-()
One I have related here several times before was for Shuttle re-entry
data processing. Tables of balloon data for atmospheric properties along
the reentry corridor used such missing data flags. That was only
supposed to happen for data that wasn't going to end up getting used
anyway, as the Shuttle wasn't going to be at 200,000+ feet over
Bakersfield. Unfortunately, a poor choice of interpolation algorithm led
to trying to use that data... without a check for the missing data flag.
The results were quite a few orders of magnitude off. When I complained
about getting data that was so implausible, the contractor folk who did
that work explained that checking for engineering plausibility was not
their job.
I never again used any data from that particular group. Figured we were
ahead to just duplicate the work because that was the only way we were
going to have any confidence in it. Maybe some of their other errors
might have given results that were plausible, but still wrong.
I have had many other simillar experiences, some dating back as far as
my earliest programming data in the late 60's. I recall using -999. as
an end-of data flag before Fortran had a standard method of signalling
and end of file. And I recall getting bitten by it.
Maybe your experience differs, but based on mine, I'll stick with
"fraught with peril."
On 2010-08-22 12:53:56 -0400, breyfogle <brey...@aol.com> said:
> While you are absolutely correct in the strictest sense, in reality
> most FORTRAN applications are built to solve a specific type of
> problem. The programmer should have a good idea of the range of
> possible values they are dealing with and picking a personal NaN string
> is not very difficult and not really "fraught with peril". I've been
> using this approach in engineering applications for a very long time
> and have never had a problem.
One issue is the semantic overload: Not-a-Number
is convolved with Not-a-Value. Both are useful concepts,
but they are not interchangeable.
While it is certainly possible to choose a "personal NaN",
when software is reused the domain may be enlarged
in ways unanticipated by the original design. Therein lies the danger.
--
Cheers!
Dan Nagle
The problem with making up one's own integer (or logical, character, etc.)
NaN is that such a NaN, being just another integer (or ...) from the
compiler's point of view, may be generated by, say, multiplying two
integers. Once that happens, we run into something akin to Rumsfeld's
problem of distinguishing known unknowns from unknown unknowns.
Or, consider an array of integer that represents RGB values. A program
processing such data, doing something such as, say, resizing an image, has
to be able to deal with pixels that have an RGB value that is exactly the
same as the one chosen as NaN.
-- mecej4
There are some integer values that are already special, and one
should watch out for.
For sign-magnitude and ones-complement (rare these days) there is
an integer negative zero. Some systems are careful not to generate
negative zero unless it is one of the operands for an operation,
others can generate one.
For twos-complement, the most negative value is special.
It is the only negative value that, when either negated
or as an absolute value, gives a negative result. That can
already cause problems even if it isn't used as a "special"
value, such that you might need to test for it anyway.
But yes, no matter what value is chosed there is always the
possibility that someone will forget a test and the value will
be used as actual data. But that can still happen if a separate
flag is used, as there normally must still be some value in
the variable.
-- glen
> Richard Maine <nos...@see.signature> wrote:
> > For integers and characters, it is reasonably common for applications to
> > make use of every available bit pattern. That's particularly so for
> > 8-bit and 16-bit integers, but happens for 32-bit ones as well. Integers
> > are reasonably often used as "bit containers".
> There are some integer values that are already special,...
Not if you are using the variables as "bit containers." The special
things you are referring to all have to do with doing arithmetic
operations, which one should not do for bit containers. If you do
arithmetic on bit containers, than that's the mistake. Bits don't have
arithmetic - just logical operations. It is a common enough mistake, but
one that is not particular to questions of special values.
I know of no current systems where you can't reliably transmit all
possible bit patterns in integers. Sure, I can imagine such systems. One
could argue that systems that store, say, 16-bit integers in 32-bit
storage units are like that. Such systems have existed, as you have
noted before. I have seen systems where some character bit patterns get
trashed in transfer (for example, the top bit can get dropped); it has
been a while, admitedly.
Well, Fortran doesn't supply the unsigned integers that some
other languages use for bit containers. There have been machines
that reserved negative zero (sign magnitude or ones complement)
such that unusual things would happen when it was used.
> I know of no current systems where you can't reliably transmit all
> possible bit patterns in integers. Sure, I can imagine such systems.
One reason is that people do like to use them as bit containers,
and would complain or not use such systems. I don't know any
twos complement system that reserves the most negative value,
but it could be done. The Fortran numeric model carefully excludes
that value, though as you say only in numeric contexts.
Presumably Fortran systems for machines reserving such value would
allow for N-1 bits in the bit model.
> One
> could argue that systems that store, say, 16-bit integers in 32-bit
> storage units are like that. Such systems have existed, as you have
> noted before. I have seen systems where some character bit patterns get
> trashed in transfer (for example, the top bit can get dropped); it has
> been a while, admitedly.
Well, you do have to be sure to use binary mode in ftp, but otherwise
most seem to get this right. I do remember in the MSDOS days that a
few characters wouldn't survive the print spooler printing to a serial
printer.
-- glen
> Richard Maine <nos...@see.signature> wrote:
I've seen them dropped in ordinary I/O, without printing being involved.
Write a character datum to a disk file, read it back in, and you might
find the high order bit dropped. Details past that forgotten.
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>> Richard Maine <nos...@see.signature> wrote:
>
>>> I have seen systems where some character bit patterns get
>>> trashed in transfer (for example, the top bit can get dropped); it has
>>> been a while, admitedly.
>>
>> Well, you do have to be sure to use binary mode in ftp, but otherwise
>> most seem to get this right. I do remember in the MSDOS days that a
>> few characters wouldn't survive the print spooler printing to a serial
>> printer.
>
> I've seen them dropped in ordinary I/O, without printing being involved.
> Write a character datum to a disk file, read it back in, and you might
> find the high order bit dropped. Details past that forgotten.
ASCII is a seven bit code so the eighth bit is a matter of luck. As
well there are a large collection of device control codes that were not
printing. At one time the eighth bit was enough of a bother that it
was useful service if it was stripped out.
Back when fancy formatting meant double striking for bold the eighth bit
would be set by things like WordPerfect and then cause unknown character
diagnostics from early compilers. That is why so many programmers text editors
have various text cleanup options like Zap Gremlins. I seem to recall one
compiler that would not even tolerate a form feed which I had used to tidy
up the listing of a file with several subroutines.
There were some systems that used parity-ASCII, so that didn't work
very well! Sending their text to a system with some other convention
tended to a bit confusing, though English text remained more-or-less
intelligible :-)
Regards,
Nick Maclaren.