Message from discussion
RTX2000 optimization
Received: by 10.66.84.34 with SMTP id v2mr3801308pay.38.1352116877719;
Mon, 05 Nov 2012 04:01:17 -0800 (PST)
MIME-Version: 1.0
Path: 6ni63358pbd.1!nntp.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!goblin1!goblin.stu.neva.ru!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From: an...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: RTX2000 optimization
Date: Mon, 05 Nov 2012 11:55:35 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 77
Message-ID: <2012Nov5.125535@mips.complang.tuwien.ac.at>
References: <e1479cfa-a969-40ab-b20c-096d82601657@googlegroups.com> <k6hcat$tc1$3@dont-email.me> <1762397.1u9z86ECTl@sunwukong.fritz.box> <k6hq5t$s0f$1@dont-email.me> <k6ipnf$28h$1@speranza.aioe.org> <k6ju0o$5ib$1@dont-email.me> <k6n1ub$i0b$1@speranza.aioe.org> <k6pljm$df7$1@dont-email.me> <3289bb1a-2b62-4b5e-b32f-55887176d7b9@googlegroups.com> <716d61f6-ca70-4dc3-b9d8-d0f0c8eab91a@googlegroups.com> <7xpq3vtmzg.fsf@ruckus.brouhaha.com> <1413599.g8YLPlRKUt@sunwukong.fritz.box> <k73me7$gf2$1@dont-email.me>
Injection-Info: mx04.eternal-september.org; posting-host="e3602f98984ce6eff143cbffbeeac014";
logging-data="11071"; mail-complaints-to="ab...@eternal-september.org"; posting-account="U2FsdGVkX19NQ7hJ2aChSQplyL5RtJ9f"
X-newsreader: xrn 10.00-beta-3
Cancel-Lock: sha1:4DN0R4DdK9aqy8j8siO2xDbtVTQ=
Bytes: 5735
rickman <gnu...@gmail.com> writes:
>On 11/3/2012 12:47 PM, Bernd Paysan wrote:
>> We had a student trying to create a C backend for the b16 CPU, using one
>> of those with a stack based intermediate language - the C frontend wrote
>> the intermediate language out and then the student wrote his own
>> backend, which, given that the b16 is already a stack processor, was not
>> very complicated.
>>
>> Code densitry however fell considerably, because in C, you absolutely
>> *must* handle locals. The student wrote some simple code to use a
>> normal locals stack, and that was really a lot of overhead compared to
>> Forth code doing the same. Even if the b16 had something like the
>> Transputer workspace, the C code still would be considerably larger than
>> Forth code.
>
>But a C compiler can optimize to use registers in place of locals on the
>stack or anywhere else in memory. I guess it is a harder task to
>optimize to use a stack efficiently?
Not particularly. Martin Maierhofer looked at this in his Diploma
thesis, and found that a simple approach (stack scheduling) led to
results that are close to optimal. However, he only looked at
optimizing straight-line code. Things get more interesting and
complex if you want to keep C local variables on the stack across
control structures.
@MastersThesis{maierhofer97,
author = {Martin Maierhofer},
title = {Erzeugung optimierten Codes f\"{u}r Stackmaschinen},
school = {{Technische Universit\"{a}t Wien}},
type = {Diplomarbeit},
year = {1997},
address = {Austria},
url = {http://www.complang.tuwien.ac.at/Diplomarbeiten/maierhofer97.ps.gz},
abstract = {The success of the Java programming language and the
underlying stack architecture (the JavaVM) recently
have caused renewed interest in stack architectures.
New and special techniques are required to provide
support for the efficient execution of Algol-like
high level languages on stack machines. An optimizing
compiler should be able to eliminate dispensable
accesses to main memory when loading and storing local
variables by temporarily keeping a copy of their value
on the stack. This technique, however, is only
meaningful for machines that can handle
stackmanipulations faster than accesses to main
memory. \par This thesis presents two solutions for
the problem and compares the results that can be
achieved for two stack architectures (one of them is
the JavaVM). Both techniques work on a ``local'' level,
meaning that each basic block is considered separately.
Phil Koopman's ``stack scheduling'' performs well in
cooperation with a simple instruction scheduling
strategy (a depth first postorder traversal of the
dependency graph is used). The second technique
(``{\scshape Dag} scheduling'') reorders the
instructions (i.e. it schedules the instructions) in
order to minimize accesses to local variables and
leads to optimal code. \par The efficiency of Koopman's
algorithm can be assessed with respect to the results
achieved by {\scshape Dag} scheduling: stack scheduling
leads to results that come quite near the optimum.
Accesses to variables can be reduced from around 40\%
(in code produced by the Java compiler \texttt{javac})
to some 30\% (after optimization) of the total
instructions in JavaVM code. Instructions for
stackmanipulation, however, increase from about 5\% to
up to 15\% of the total.},
note = {In German}
}
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/