deepening into fortran 90,95, 2003

octaedro

unread,

May 2, 2008, 11:47:01 AM5/2/08

to

Hello folks,

I have been programming in fortran for about a year and a half since I
wrote my "Hello World" program. Since then my projects get bigger and
bigger and although I own two fortran manuals... Metcalfe, Reid and
Cohen, and Ellis, Phillips and Lahey, I still do not know very well
how to set options on my compiler to make programs run fast or how to
do profiling of the programs... or something even more basic, how to
tell when your coding style are making things running slower than they
should. Soooo my question is if anybody knows about any good source to
self train your self

Thank you as always

Mark Westwood

unread,

May 2, 2008, 12:16:21 PM5/2/08

to

Hi

To learn how to use your compiler read the manual until your eyes
bleed with the effort, it will be worthwhile in the end. Look for
optimisation, fast, that sort of word, in the index.

To profile programs, well that is at least partly platform dependent,
where platform = operatingSystem+compiler+toolset. On Linux you'll
probably find a copy of gprof already installed -- use the usual
sources for information (Google, man, info). You may also discover
that your compiler comes with a profiler thrown in -- for example the
Portland Group compiler provides a tool called PGPROF. On Windows,
I'm not sure, does Visual Studio include a profiling tool perhaps ?

To write programs which run quickly: careful use of compiler options
is usually the first and best route to faster programs. With O(1 day)
effort you can achieve O(2) speedup. Then it gets harder. You might
want to start modifying your code to take best advantage of the memory
hierarchy on your computer(s) -- if you don't know what this means,
then you should find out. But this can be very time consuming to do
effectively, offering O(2) speedup at a cost of O(10 days) effort.

The best thing to do, in almost all cases, however, is to choose a
better algorithm; this can offer O(10) times speedup at a cost of O(5)
days effort. Of course, there may not be a better algorithm, but the
effort of trying to find one is good for your education :-)

The last thing you should do is code-tinkering: unrolling loops,
fusing (or unfusing) loops, re-ordering loops, that sort of thing. If
carefully used (you did read the manual, didn't you ?) your compiler
will do a better job of this level of tinkering than you will. You
should leave this until hell freezes over, that's how late you should
resort to it.

NOTE carefully -- all the estimates of effort against speedup are wild
guesses based on some experience and you will not be able to reproduce
these data. Nor is there much point in arguing with my wild guesses,
I won't argue back. The only data which matters is data derived from
reproducible experimentation. Which I should have mentioned -- if you
do want to start optimising the performance of your programs, you must
start measuring it for a range of inputs and problems.

And all of the rest are my assertions, feel free to pay more attention
to someone else's different assertions, about how you can best use
your time.

Good luck

Mark Westwood

James Van Buskirk

unread,

May 2, 2008, 12:47:46 PM5/2/08

to

"octaedro" <jorge.al...@gmail.com> wrote in message
news:560c2727-01b8-46a3...@f24g2000prh.googlegroups.com...

If you want to discover whether your coding style is making things
run unnecessarily slowly, there simply is no substitute for examining
the compiler output, i.e. a disassembly of the machine languge code
the compiler generated. You can no more guess what you might have
done that creates a situation where the compiler has to do extra
work do achieve your purpose than you can find the invisible bugs
in your code by reading it for the fourth time without testing.

Also compilers can do amazingly lame things when translating your
Fortran source code to machine code but often an examination of
compiler output will not only let you know when the compiler crapped
out on your program but even give you insight as to what triggered
the undesirable behavior so that you have a chance to remedy the
situation.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Dan Nagle

unread,

May 2, 2008, 3:59:04 PM5/2/08

to

Hello,

On 2008-05-02 12:47:46 -0400, "James Van Buskirk" <not_...@comcast.net> said:

>
> If you want to discover whether your coding style is making things
> run unnecessarily slowly, there simply is no substitute for examining
> the compiler output, i.e. a disassembly of the machine languge code
> the compiler generated.

I beg to differ.

If you want to know how _fast_ some code is,
time it. It's hard to predict what today's hardware
will do with an emitted code sequence.

--
Cheers!

Dan Nagle

glen herrmannsfeldt

unread,

May 2, 2008, 4:29:41 PM5/2/08

to

Dan Nagle wrote:

> On 2008-05-02 12:47:46 -0400, "James Van Buskirk" said:

>> If you want to discover whether your coding style is making things
>> run unnecessarily slowly, there simply is no substitute for examining
>> the compiler output, i.e. a disassembly of the machine languge code
>> the compiler generated.

> I beg to differ.

> If you want to know how _fast_ some code is,
> time it. It's hard to predict what today's hardware
> will do with an emitted code sequence.

I would say somewhere in between.

First, some coding styles will generate bad assembly code
which will run slow on all processors.

Second, if you find some code that runs faster on one specific
processor you can't generalize that unless you know the generated
code. The timing could be very different for a slight change in
the source, or even no change but change somewhere else,
due to large changes in the generated code. (You still can't
generalize, but you have a better chance if you know
the generated code.

Third, regarding Dan's point, there is a variety of
hardware out there and the code might have to run on different
processors.

Timing of code only tells you how long that specific code
runs on one specific processor. You can't generalize to
similar code or similar processors.

Sometimes the best you can do is to use the one with
the fewest instructions, with some weighting for more
complex instructions. Also, be sure that the data is
appropriately aligned. Otherwise, knowing both the
generated code and timing you can make better guesses.

-- glen

GaryScott

unread,

May 2, 2008, 4:45:23 PM5/2/08

to

I long for the pre-cache days, when things were more predictable...

Michael Metcalf

unread,

May 2, 2008, 4:52:38 PM5/2/08

to

"Mark Westwood" <markc.w...@gmail.com> wrote in message
news:2c70fd38-d6de-4f7b...@m45g2000hsb.googlegroups.com...

>
> And all of the rest are my assertions, feel free to pay more attention
> to someone else's different assertions, about how you can best use
> your time.
>

I have to say, given that the OP is relatively new to programming, that your
advice was an excellent guide on how to get started. Getting inexperienced
programmers to pore over assembly code is without question the least useful
approach.

Regards,

Mike Metcalf

Richard Maine

unread,

May 2, 2008, 4:52:46 PM5/2/08

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

[discussion of code optimization]

I disagree that it is useful at all for most people to look at generated
code. I haven't done so myself for ages. I particularly think it not
useful for relative novices such as the OP.

Take instead, the advice of Hoare and Knuth, which was briefly echoed by
Mark in this thread, but seemed sort of buried to me.

First and foremost, pay attention to algorithms. That is really, really
important. And you need to actually do it rather than just give it lip
service. You also need to keep doing it, regularly revisting the
question as appropriate to something that is the most important factor
in program speed. Whenever I hear something like "I already did that",
it tends to make me think that it wasn't really taken seriously enough.

Sure, there are times when you have done everything practical with
algorithm selection and still need some more. But it is *SO* important
and so often inadquately done that I think it needs a bit of harping on.

After that, I think it wise to consider a few broad principles.
Understanding memory access issues is certainly one of those. Also in
that class is having at least a general idea of what kind of operations
are implied by the syntax one uses. One doesn't need to look at the
exact instructions to appreciate issues such as the potential for
needing temporary array allocations and copy-in/copy-out.

When one does finally get down to line-by-line code tweaking, I think it
critical to actually benchmark proposed alternatives. History is replete
with even experts failing to correctly predict the performance
implications of various coding styles... and such predictions are harder
today than they have been in the past.

Way, way down on the list is looking at assembly code. As I said, I
haven't seriously done it in an age... probably 2 decades or so. There
is just no way that I would recommend that someone at the OP's level get
distracted by doing that. It probably won't do him any good and will
instead distract him from those things that he otherwise might have
looked at that could do some good.

Maybe after the OP has a decade and a half of experience instead of a
year and a half, it might make sense. But I doubt it still then. I
suppose that maybe some of us are just going to disagree, but my
personal recommendation to the OP is to ignore any suggestion to look at
assembly-level code.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

James Van Buskirk

unread,

May 2, 2008, 4:57:36 PM5/2/08

to

"Dan Nagle" <dann...@verizon.net> wrote in message
news:2008050215590437709-dannagle@verizonnet...

> I beg to differ.

You can tell when the compiler emits garbage. Today's hardware
performs poorly when presented with garbage. There is no way to
intelligently discuss performance without some kind of hands-on
notion of what the hardware can do. If you don't know the
capabilities of the hardware you have no idea whether your code is
fast, only whether it is faster or slower than other codes that
you have timed. Only starting with performance analysis from the
perspective of the hardware at hand can you get a ballpark
estimate of what kind of speed you should be aiming for.

I suppose we shall have to agree to disagree on this issue.

rusi_pathan

unread,

May 2, 2008, 5:00:28 PM5/2/08

to

The best approach is to write straightforward code and let the
compiler do the tricks (though you might have to experiment with
compiler flags) besides using vendor supplied numerical libraries. For
more info check out http://www.osc.edu/supercomputing/training/perftunmic/perftune.ls.pdf

octaedro

unread,

May 2, 2008, 6:46:55 PM5/2/08

to

I don't know what to say... you folks enlightened me and scared the
shit out of me at the same time :-) It seems I have a lot of work to
do
...but what I may say is that YOU ARE LIKE GODS TO ME

glen herrmannsfeldt

unread,

May 2, 2008, 7:04:58 PM5/2/08

to

Richard Maine wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> [discussion of code optimization]

> I disagree that it is useful at all for most people to look at generated
> code. I haven't done so myself for ages. I particularly think it not
> useful for relative novices such as the OP.

I agree for 'most' people. It is, though, fairly easy to
write Fortran that is much slower than it could be. Factors
of two easily, and even much larger factors.

> Take instead, the advice of Hoare and Knuth, which was briefly echoed by
> Mark in this thread, but seemed sort of buried to me.

Hopefully only for those who are past that. Ones that know they
need even the last 10% improvement in speed.

> First and foremost, pay attention to algorithms. That is really, really
> important. And you need to actually do it rather than just give it lip
> service. You also need to keep doing it, regularly revisting the
> question as appropriate to something that is the most important factor
> in program speed. Whenever I hear something like "I already did that",
> it tends to make me think that it wasn't really taken seriously enough.

Yes, this probably can't be said enough times.

> Sure, there are times when you have done everything practical with
> algorithm selection and still need some more. But it is *SO* important
> and so often inadquately done that I think it needs a bit of harping on.

> After that, I think it wise to consider a few broad principles.
> Understanding memory access issues is certainly one of those. Also in
> that class is having at least a general idea of what kind of operations
> are implied by the syntax one uses. One doesn't need to look at the
> exact instructions to appreciate issues such as the potential for
> needing temporary array allocations and copy-in/copy-out.

and the effects of cache on memory access patterns.

While I completely agree that most of the time one shouldn't
worry about generated code and speed, still, there is no excuse
for going through arrays with loops nested in the wrong order.

It might be nice to have a compiler warning when a temporary
array was used. It isn't always easy to guess what the compiler
is doing.

> When one does finally get down to line-by-line code tweaking, I think it
> critical to actually benchmark proposed alternatives. History is replete
> with even experts failing to correctly predict the performance
> implications of various coding styles... and such predictions are harder
> today than they have been in the past.

And remember Brook's law.

http://en.wikipedia.org/wiki/Amdahl's_law

There is little use in speeding up parts where the program
only spends a small amount of time. Once you know the bottleneck,
where the program spends a large amount of time on only a
few statements, those are the ones to look at.

> Way, way down on the list is looking at assembly code. As I said, I
> haven't seriously done it in an age... probably 2 decades or so. There
> is just no way that I would recommend that someone at the OP's level get
> distracted by doing that. It probably won't do him any good and will
> instead distract him from those things that he otherwise might have
> looked at that could do some good.

Well, I don't know much about the OP and the OP's level.

The simplest form, looking at the number of generated instructions,
isn't so hard to do. I was surprised recently (responding to some
of James' posts) how much different the code g95 generated with
-O2 was from the default optimization. Maybe half as many
instructions. (Load/store for temporary variables.)

If you have it down to a few statements, or for programs that
are already small but still slow, and you need more speed,
then you can look at the generate code.

> Maybe after the OP has a decade and a half of experience instead of a
> year and a half, it might make sense. But I doubt it still then. I
> suppose that maybe some of us are just going to disagree, but my
> personal recommendation to the OP is to ignore any suggestion to look at
> assembly-level code.

Maybe about right. I believe I was doing it within the first year
and a half of learning Fortran. Partly that is how I learned to
write assembler. That, and sample code some of which was the
Fortran library. But things were different 30 years ago.

My first assembly programs were Fortran callable subroutines.
(That avoided having to think about doing I/O.) That was about
two years after I started Fortran programming.

-----------------------------------------------------

The other case where one should look at the generated code is
when one suspects a compiler bug generating the wrong code.

-- glen

glen herrmannsfeldt

unread,

May 2, 2008, 7:08:46 PM5/2/08

to

GaryScott wrote:
(snip)

> I long for the pre-cache days, when things were more predictable...

Unfortunately, predictably slower. Many of the changes over
the years, caching and instruction overlap being two,
make timing less predictable but usually faster.

-- glen

Terence

unread,

May 2, 2008, 8:13:18 PM5/2/08

to

Algorithm, always the algorithm!

Here's an example from the early sixties.
IBM brought out the 1440 and its disc (post- 1401, prior to the IBM
360).
The UK (..) branch quoted on a banking project that was to use these
magnificent random-access drives to do a specific reporting job
quickly, and demonstrated how fast with a mock-up file, which went
thunk-thunk-thunk while the representatives beamed.

The competion used tape drives, read them end-to-end once (suiiish!)
then read them again (zhooo!) while spewing out the needed reports.
Very, very fast!. They got the contract.
Random aceess was actually the slower algorithm.

glen herrmannsfeldt

unread,

May 2, 2008, 9:05:07 PM5/2/08

to

Terence wrote:
(snip)

> The competion used tape drives, read them end-to-end once (suiiish!)
> then read them again (zhooo!) while spewing out the needed reports.

Forward or backward? One fun thing about tape drives in those
days was the ability to read backwards. Some sorting algorithms
have been optimized for that ability.

> Very, very fast!. They got the contract.
> Random aceess was actually the slower algorithm.

Though if you sort them a little bit before doing the disk
accesses, you can speed up random access disk significantly.

-- glen

Gary Scott

unread,

May 2, 2008, 9:39:28 PM5/2/08

to

glen herrmannsfeldt wrote:
> Richard Maine wrote:
>
>> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>
>> [discussion of code optimization]
>
>
>> I disagree that it is useful at all for most people to look at generated
>> code. I haven't done so myself for ages. I particularly think it not
>> useful for relative novices such as the OP.
>
>
> I agree for 'most' people. It is, though, fairly easy to
> write Fortran that is much slower than it could be. Factors
> of two easily, and even much larger factors.

I've been bitten many times by the slowness of concatenation of strings.
In order to populate RTF fields, you sometimes need to build up an
extremely long string from smaller chunks. Concatenation seemed the
most obvious (naive) solution to me. But it was excrutiatingly,
noticeably slow. Changing it to direct indexing was a huge speedup.

<snip> (see I snipped this time!!)

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

Gary Scott

unread,

May 2, 2008, 9:40:52 PM5/2/08

to

glen herrmannsfeldt wrote:

Sometimes, that's ok, especially for real-time (predictability is
sometimes more critical if the overall performance is acceptable).

>
> -- glen

e p chandler

unread,

May 2, 2008, 11:40:41 PM5/2/08

to

What about DECTAPE? IIRC it was able to read backwards, at least on a
block by block basis.

-- e

Damian

unread,

May 3, 2008, 12:16:14 AM5/3/08

to

I would first echo the sentiments that profiling and timing code are
key. Approach performance empirically and profile your code on the
platform of interest. Otherwise, you're likely to obsess over code
segments that only contribute a small fraction of the overall
execution time. I recommend starting with gprof for serial code
(assuming you're on a Unix-like system) and progressing to something
like the Tuning and Analysis Utilities (TAU) if you're interested in
parallel code: see http://www.cs.uoregon.edu/research/tau/home.php.

That was my more standard, mainstream answer. Now for the contrarian
answer with which very few people with agree. First, think about
whether you're writing code primarily for yourself or for a community
of users. If it's for yourself, then think about the fact that

total solution time = development time + execution time

You can spend all the time in the world optimizing your execution
time, but if in doing so, you increase your development time more than
you decrease your execution time, your net solution time goes up.
Therefore, focus first on programmability, i.e. focus on writing
flexible, clear code that makes your job easier rather than making the
computer's job easier. In the end, the development-time savings will
likely swamp the execution time savings. (At least in the graduate-
school setting, people spend years writing and optimizing code that
will ultimately run for only a few months when they could graduate a
lot faster by writing simpler code that is somewhat slower.)

Even if your goal is to write code for a user community, a similar
argument might hold if your focus on programmability empowers you to
add new features faster than the competition even if the competition's
code runs somewhat faster.

And if you do decide to focus on programmability, I highly recommend
checking out some of the literature on object-oriented programming in
Fortran, but I'm sure others will disagree and I look forward to the
enlightenment.

Damian

glen herrmannsfeldt

unread,

May 3, 2008, 2:22:50 AM5/3/08

to

e p chandler wrote:
(snip)

> What about DECTAPE? IIRC it was able to read backwards,
> at least on a block by block basis.

DECtape is random access. I believe it seeks to the
beginning of the block and reads forward, though I am
not so sure on that.

There are stories of using DECtape as a swap device, at
least long enough to see if it would work.

-- glen

Janne Blomqvist

unread,

May 3, 2008, 3:21:38 AM5/3/08

to

On 2008-05-02, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> The simplest form, looking at the number of generated instructions,
> isn't so hard to do. I was surprised recently (responding to some
> of James' posts) how much different the code g95 generated with
> -O2 was from the default optimization. Maybe half as many
> instructions. (Load/store for temporary variables.)

With gcc, and presumably g95 as well since it uses a gcc backend,
default optimization is no optimization. Adding -O2 often gives a huge
performance boost, but after that it gets much more difficult to
increase performance by tweaking compiler options.

--
Janne Blomqvist

james...@att.net

unread,

May 3, 2008, 5:11:48 PM5/3/08

to

I disagree that either caching or pipelining are responsible for
instruction timings being unpredictable. Yes, multiprocessors make
the cost of a cache miss less predictable, but the optimization
problem with caching has always been to try to arrange data so cache
misses occur less often, not to predict how long they take. Then,
given good instruction timing (assuming cache misses don't happen) and
details about how many functional units do which instructions, you can
do reasonable instruction scheduling - and benefit from the
instruction overlap, not get confused by it.

I think the real problem is instruction reordering. Evidently the
coding community (us) was doing such a bad job at instruction
scheduling that the hardware people decided to take it out of our
hands. To be sure, the more functional units there are, and the more
complex the relationships between them get, the harder the job is to
do (and the more dependent it is on which specific CPU model you're
using). So I'm not saying they had no call to do it. But it has had
the effect that good instruction timing information is no longer
available. An important tool we coders need to do a better job is no
longer there.

Even so, we still have control over instruction selection and register
allocation decisions. Those can make a big difference in code speed.
So, in fact, can some instruction scheduling. There's a limit to what
the hardware instruction scheduling can do. I'm told that some
compiler people spend time reordering code in their most important
benchmarks and measuring the speed on different CPU models just to
tweak that last bit of speed out of the machine. They are sort of
reverse engineering what the hardware instruction scheduling is doing
(or its consequences at least). And, I suspect that Intel gives more
precise timing information to their own compiler people that it
releases to the general public.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare

Terence

unread,

May 3, 2008, 7:17:30 PM5/3/08

to

The answeres to these points are:

a) if you read backward but are processing a linear file for
reporting, you would FIRST have to go to the end of the tape, read
backward for the first process and then read forward for the reporting
run.
Rewinding is faster than end of file seeking, and anyway there are
always three operations, two reads and one rewind equivalent. So you
don't read backword in this case. The final re-wind is off-line.

b) Tape, as noted, can be read backwards in certain modes. That is why
the direct access formatted mode has prefixes and postfixes, so the
tape drive channel code can locate the file records in both
directions. We still reatain this coding in Fortran, even if we have
disk drives now, for comptatibility and the still-used tape drives in
some places.

Greg Lindahl

unread,

May 3, 2008, 7:53:33 PM5/3/08

to

In article <slrng1o4k...@vipunen.hut.fi>,
Janne Blomqvist <f...@bar.invalid> wrote:

>With gcc, and presumably g95 as well since it uses a gcc backend,
>default optimization is no optimization.

... the default is picked by the frontend.

PathScale picked -O2 as the default, but the special case of -g
without any -On falls back to -O0 with a warning, to ease
debugging. Not that many users were surprised.

-- greg

Greg Lindahl

unread,

May 3, 2008, 7:51:37 PM5/3/08

to

In article <87608c18-c5a7-44d2...@u12g2000prd.googlegroups.com>,
Terence <tbwr...@cantv.net> wrote:

>b) Tape, as noted, can be read backwards in certain modes. That is why
>the direct access formatted mode has prefixes and postfixes,

Indirectly through the BACKSPACE command, you mean? You can't backspace
efficiently on a disk without the postfix.

-- greg

James Van Buskirk

unread,

May 3, 2008, 8:05:15 PM5/3/08

to

<james...@att.net> wrote in message
news:e39a3bb6-34f8-4031...@m45g2000hsb.googlegroups.com...

> I disagree that either caching or pipelining are responsible for
> instruction timings being unpredictable. Yes, multiprocessors make
> the cost of a cache miss less predictable, but the optimization
> problem with caching has always been to try to arrange data so cache
> misses occur less often, not to predict how long they take. Then,
> given good instruction timing (assuming cache misses don't happen) and
> details about how many functional units do which instructions, you can
> do reasonable instruction scheduling - and benefit from the
> instruction overlap, not get confused by it.

I share the above view.

> I think the real problem is instruction reordering. Evidently the
> coding community (us) was doing such a bad job at instruction
> scheduling that the hardware people decided to take it out of our
> hands. To be sure, the more functional units there are, and the more
> complex the relationships between them get, the harder the job is to
> do (and the more dependent it is on which specific CPU model you're
> using). So I'm not saying they had no call to do it. But it has had
> the effect that good instruction timing information is no longer
> available. An important tool we coders need to do a better job is no
> longer there.

Now, I disagree with the above paragraph. From programming on an
Alpha 21164 with two floating point pipelines with 4 clocks latency
and 31 floating point registers, my experience was that you could
really feel the register pressure. It may not sound so bad because
you could only have 12 registers in flight at the same time (two
loads per clock cycle with latency 2 clocks, IIRC) but you wanted
to keep some constants in registers and given a funky instruction
stream you could run out of registers quickly. I never tried
sceduling out of L2 cache. In that case the register pressure
would have been direct.

Out of order processing permits you as a programmer to use more
registers without switching to an ISA that has more architected
registers. That way you can to a certain extent cover up latencies
by renaming registers so that you can have more registers than
can be encoded in the machines's instructions in flight at any
time.

Itanium or IA-64 dealt with the register pressure situation by
haveing lots more registers (128 in both register files, IIRC)
but because you needed 21 bits to encode all 3 registers per
instruction (given a 3-register ISA) and a few bits for the opcode
and for floating point exception masks per instruction, 32 bits
wasn't going to be big enough to hold an instruction so they
instead went to packing 4 instructions into 128 bits. This gave
Intel the additional advantage that they could encode a 64-bit
immediate load into a single instruction packet if they wanted
to, as well as the stuff with predication bits. Of course they
succeeded with this about as spectacularly as they did with the
APX-432, so we are stuck with out of order if we want to be
able to exploit the trend of increasing number of functional
units and clock speed without as much of an increase in wall-
clock speed of functional units.

> Even so, we still have control over instruction selection and register
> allocation decisions. Those can make a big difference in code speed.
> So, in fact, can some instruction scheduling. There's a limit to what
> the hardware instruction scheduling can do. I'm told that some
> compiler people spend time reordering code in their most important
> benchmarks and measuring the speed on different CPU models just to
> tweak that last bit of speed out of the machine. They are sort of
> reverse engineering what the hardware instruction scheduling is doing
> (or its consequences at least). And, I suspect that Intel gives more
> precise timing information to their own compiler people that it
> releases to the general public.

I am not so sure about that last point. I have definitely seen
Intel's compiler do some dumb stuff that makes it look as though
they weren't on the same planet as their hardware guys. But there
is some information that they could at least clarify but instead
choose to leave as obscure pitfalls for the low-level programmer.

Gordon Sande

unread,

May 3, 2008, 9:00:22 PM5/3/08

to

There are also the intermediate devices known as formatted tapes that
can be read easily in either direction. The trick is that there is a timing
track that is initially formatted then not further modified. The most
common example is the DECtape although I believe that several early
British computers, Atlas is the one I would guess, had formatted tapes
as well. The great advantage of formmated tapes is that you can write
into the middle of then as the tape provides the timing. The DECtape
had other features to make it usable with low priced minis.

Conventional tapes modify all tracks at once so have external timing.
That ultimately translates into not knowing precisely where the records
are physically so writing into the middle does not work. There were some
hacks that allowed for known record sizes to be overwritten a few times.
That was a "do not try this a home" level stunt.

Charles Coldwell

unread,

May 4, 2008, 7:57:58 AM5/4/08

to

nos...@see.signature (Richard Maine) writes:

> Take instead, the advice of Hoare and Knuth, which was briefly echoed by
> Mark in this thread, but seemed sort of buried to me.

"Premature optimization is the root of all evil." Or something
similar.

I definitely agree with that sentiment.

In my experience, if you find that your program is not performing as
well as you would like, you should start by profiling it. There's a
90-10 rule that says 90% of the execution time is spent in 10% of the
code. You must identify that 10% accurately if you want to improve
performance, and profiling is about the only way to do it.

Profiling will also often reveal where your choice of algorithms is
poor. I was recently helping a client with a program that we sped up
by a factor of two after changing about 20 lines of code out of
~100,000. There was one place where a linear search was being used on
an array of 4000 items, requiring on average 2000 comparisons.
Switching to a binary search reduces the number of required
comparisons to 12 on average, a speedup of about a factor of 200 for
that particular part of the code, and we found it very quickly in the
profile.

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
GPG Key ID: 852E052F
GPG Key Fingerprint: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F

John Harper

unread,

May 4, 2008, 6:23:25 PM5/4/08

to

In article <560c2727-01b8-46a3...@f24g2000prh.googlegroups.com>,

octaedro <jorge.al...@gmail.com> wrote:
>Hello folks,
>
>I have been programming in fortran for about a year and a half since I
>wrote my "Hello World" program. Since then my projects get bigger and
>bigger and although I own two fortran manuals... Metcalfe, Reid and
>Cohen, and Ellis, Phillips and Lahey, I still do not know very well
>how to set options on my compiler to make programs run fast or how to
>do profiling of the programs..

MR&C is a textbook about Fortran in general, not about any particular
compiler's options. EP&L appears from the amazon.com ad to be another,
but I have never seen the book itself.

What you need now is the manuals (paper, or more likely these days, on-
line) for your own compiler and operating system. All the compilers I
know have options to do what you're asking for, but you request those
options differently with different compilers.

Although error-checking options may slow your program down, failing to
use them is one way Fortran got the reputation of being a language that
gave wrong answers fast. Other ways often mentioned here are failing to
do numerical analysis properly and failing to appreciate the properties
of floating point.

-- John Harper, School of Mathematics, Statistics and Computer Science,
Victoria University, PO Box 600, Wellington 6140, New Zealand
e-mail john....@vuw.ac.nz phone (+64)(4)463 6780 fax (+64)(4)463 5045

William Clodius

unread,

May 4, 2008, 10:40:12 PM5/4/08

to

Greg Lindahl <lin...@pbm.com> wrote:

> In article <87608c18-c5a7-44d2-bf4b-2192f65cf6d2@u12g2000prd.

>googlegroups.com>,
> Terence <tbwr...@cantv.net> wrote:
>
> >b) Tape, as noted, can be read backwards in certain modes. That is why
> >the direct access formatted mode has prefixes and postfixes,
>
> Indirectly through the BACKSPACE command, you mean? You can't backspace
> efficiently on a disk without the postfix.
>
> -- greg

You can, but it requires the run time library to retain a record of the
prefixes, e.g., in a linked list or equivalent, or have an OS similar to
the old Mac OS that retained file descriptor information in a spearate
file. Remoing the postfix information would impact portability of
Fortran files. Which method is most efficient would depend on the
relative impact of reading additional information off the disc versus
retaining information in RAM, or a different location on the disc, and
the relative frequency of backspacing.

Greg Lindahl

unread,

May 4, 2008, 11:41:54 PM5/4/08

to

In article <1igeuzv.sq8yt61wrjlyxN%wclo...@los-alamos.net>,
William Clodius <wclo...@los-alamos.net> wrote:

>> Indirectly through the BACKSPACE command, you mean? You can't backspace
>> efficiently on a disk without the postfix.

>You can, but it requires the run time library to retain a record of the
>prefixes,

Uh, no. That would work if you read through the file before you start
backspacing. But users can easily start at the end of the file, and
users who just want to read the last record would be up in arms if
your implementation had to read the entire file to backspace one
record.

-- greg

glen herrmannsfeldt

unread,

May 6, 2008, 7:22:31 PM5/6/08

to

Greg Lindahl wrote:

In Fortran 66, I don't believe that there is a way to get to the
end of a file without reading all the way through. As far as I
can tell, now you can open in APPEND mode and start BACKSPACE from
the end.

The system used for Fortran UNFORMATTED files for OS/360, and still
used with newer ESA/390 and z/OS systems, has record headers and
block headers. Following the headers through a block isn't so
hard, and the OS keeps track of blocks. They don't have
record trailers.

Doing C style fseek() can require reading from the beginning
under some conditions, though.

-- glen

Richard Maine

unread,

May 6, 2008, 7:46:38 PM5/6/08

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> In Fortran 66, I don't believe that there is a way to get to the
> end of a file without reading all the way through. As far as I
> can tell, now you can open in APPEND mode and start BACKSPACE from
> the end.

I believe I've seen implementations where that wouldn't work; if you
opened with append, the only thing you could do was append.

glen herrmannsfeldt

unread,

May 6, 2008, 9:56:19 PM5/6/08

to

Richard Maine wrote:

> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

>>In Fortran 66, I don't believe that there is a way to get to the
>>end of a file without reading all the way through. As far as I
>>can tell, now you can open in APPEND mode and start BACKSPACE from
>>the end.

> I believe I've seen implementations where that wouldn't work; if you
> opened with append, the only thing you could do was append.

Neither the POSITION='APPEND' for OPEN, nor BACKSPACE mention
that it might not work. I just notice, though, in the POS=
option to READ and WRITE for Fortran 2003, 9.5.1.10:

"A processor may prohibit the use of POS= with particular files
that do not have the properties necessary to support random
positioning. A processor may also prohibit positioning a
particular file to any position prior to its current file
position if the file does not have the properties necessary
to support such positioning."

So it might be that BACKSPACE also doesn't work on such files.

-- glen

John Harper

unread,

May 6, 2008, 11:15:58 PM5/6/08

to

In article <2MCdncC4epJ6l7zV...@comcast.com>,

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>Neither the POSITION='APPEND' for OPEN, nor BACKSPACE mention
>that it might not work. I just notice, though, in the POS=
>option to READ and WRITE for Fortran 2003, 9.5.1.10:
>
> "A processor may prohibit the use of POS= with particular files
> that do not have the properties necessary to support random
> positioning. A processor may also prohibit positioning a
> particular file to any position prior to its current file
> position if the file does not have the properties necessary
> to support such positioning."
>
>So it might be that BACKSPACE also doesn't work on such files.

Glen's quote is not in a Constraint, so you might hit trouble at
run time instead of being warned at compile time. After all, the unit
number or file name may not be available at compile time.

Richard Maine

unread,

May 6, 2008, 11:38:12 PM5/6/08

to

John Harper <har...@mcs.vuw.ac.nz> wrote:

> Glen's quote is not in a Constraint, so you might hit trouble at
> run time instead of being warned at compile time. After all, the unit
> number or file name may not be available at compile time.

Change "might" to some variant of "almost certainly." The odds of the
compiler figuring out that a backspace is on a unit that was connected
to a file with that property is negligable. If there is a problem, it
will almost certainly show up at run time rather than compile time.

In the standard, there is also the huge caveat that applies to all I/O.
I'll not bother looking up the exact words, but the rough translation is
"the processor can refuse to do darn near anything that it doesn't feel
like doing for any reason".

I was more referring to actual processors than to provisions of the
standard. Per the above, the standard allows just about anything in I/O
to be restricted, so it isn't particularly interesting to try to
elaborate everything that it alows. Glenn's quote does show that some
limitations on POS= were specifically anticipated as likely, which
matches well with my recollection. But that's not directly related to my
comment about open with append anyway. Open with append preceded POS= in
the language.

The disallowance of backspace after open with append was more of a
surprise to me... if I'm not confusing it with something else, which is
quite possible.