..... some stuff deleted
>2. Experiences with helios
>==========================
I fully agree with your positive remarks ...
I have different experiences with some of your negative points:
>* Helios 1.1 cannot boot more then 19 transputers! (Some of my transputers ...
Perihelion sells two versions one is the s(ingle user) version, this one IS
restricted to 19 T's the other is the n(etwork)-version whitch supports an
arbitrary number (I heared > than 64 T's). You can upgrade from s- to n-version
for ~250 pounds.
>* If I connect my transputers to a 2-D-grid ...
The problem with the 2D-grid doesn't exist for me. My configuration is a
in-house-made B004-type board with up to 9 T800's/12MB each or up to
19 T800's/4MB each or a mix of both configurations. I'm using the 9 T-version
(25MHz) in a cube-structure, one T connects over the diagonal, 8 sit in the
corners and are connected with each other. We do run Helios 1.1A but even with
Helios 1.1 I had no problems at all! If your hardware is ok (links are very
sensitive to skew in clocktiming) I guess your resource map doesn't correspond
to your actual hardware configuration.
>* Helios crashes a lot of times ...
We had similar crash problems. The PC-server probably doesn't have enough
memory and doesn't care to tell you about. Try to give as much as possible
DOS-memory to the PC-server. With version 1.1A we don't have that problem.
Here the server doesn't start with too little memory!
>* The editor is totally undocumented. I had docs of other emacs versions ...
Try to get version 3.9 of emacs (source) either from Parsitec or Perihelion.
Then you have the docu and can tailor the editor.
>* The memory management is full of errors! After repeated malloc() ...
I don't know about this one. We malloc() a lot and in small and large chunks.
The memory used by Helios depends on what kind of services the programs
requested. E.g after a compile session Helios keeps some resources (libaries)
in memory.
>* A lot of the good features of helios are unusable (bad documentation ...
Documentation is sometimes incomplete but the people at Perihelion are
extremely helpful. A good documentation is the little booklet 'The Helios
Parallel Programming Tutorial' and 'The CDL-Guide'.
>* It is impossible to redirect the stderr, so we could never save ...
We havn't had such problems at all. In our makefile we always redirect the
error output into files.
>* The error return codes (8 hex digits) are ugly and not very useful ...
I think these codes are very helpful but hard to decipher. Try to use the
fault library mechanism or the Helios fault-command (Chapter 3,commands)
>* Sometimes user programs started by a shell script don't start ...
I experienced this behavior in very early Helios versions (<1.0). Under 1.1A
our programs and CDL-scripts work well.
>* After a while one ore more transputers die without a recognizable ...
No idea. Couldn't this be the result of a wild program damaging Helios
resources on that transputer?
>* Helios boots very slowly, and we have to boot all the time ...
My nine T's need 5 to 6 seconds to boot (again under 1.1A). Try to start
the server in the verbose mode and see what Helios is trying to do.
>Outside of Helios I can boot 25 transputers in less than one second ...
I guess Helios looses a lot of time in vain retries to boot or load something.
Best wishes, Tommy
Tommy Leemann, BITNET: IBTALL01@CZHETH1I
Institute for Biomedical Engineering
Swiss Federal Institute of Technology, Zurich, Switzerland
>>* If I connect my transputers to a 2-D-grid ...
>.... I guess your resource map doesn't correspond
>to your actual hardware configuration.
I am quite sure that my resource map is ok. May be the reason is my small
memory (250 KB per node).
>>* Helios crashes a lot of times ...
>We had similar crash problems. The PC-server probably doesn't have enough
>memory and doesn't care to tell you about. Try to give as much as possible
>DOS-memory to the PC-server. With version 1.1A we don't have that problem.
>Here the server doesn't start with too little memory!
My server has 640 KB PC memory. And server crashes are not the only crashes.
>>* It is impossible to redirect the stderr, so we could never save ...
>We havn't had such problems at all. In our makefile we always redirect the
>error output into files.
Could You please post an example?
>>* After a while one ore more transputers die without a recognizable ...
>No idea. Couldn't this be the result of a wild program damaging Helios
>resources on that transputer?
This happens even if *no* user programs are running.
Thank You for Your Reply.
--------------
Dieter Homeister, Universitaet Stuttgart,
Institut fuer parallele und verteilte Hoechstleistungsrechner (IPVR)
7000 Stuttgart 1, Azenbergstr. 12, Tel 0711-121-1342, W-Germany
e-mail home...@informatik.uni-stuttgart.dbp.de
>1. Reset Problems
=================
The original driver im_ra_b4.d was designed when we did not have any boards
with more than one processor and does not work with more than two processors.
A new driver tram_ra.d (shipped with Helios 1.1a), which is designed for use
with the INMOS reset scheme on more than one processor, solves the problems
encountered with the previous driver
>* Helios 1.1 cannot boot more than 19 processors
As Tommy Leemann has correctly pointed out the only reason that Helios
does not boot more than 19 transputers is due to a restriction built in
to the standard version of Helios to enforce a licensing restriction.
Helios is currently running on a 128 processor Parsytec Supercluster.
Please contact DSL for more details if you need to run on more than
20 processors.
>>* If I connect my transputer to a 2-D grid
>... small memory (250 KB per node)
We recomend that Helios runs on transputers with not less than 1 MegaByte
of memory. If you want to use transputers with less memory we recomend
that you mark them as NATIVE in the resource map and run code in them
using out stand alone compiler.
If anyone has any difficulties booting certain hardware configuration
could they please contact us directly.
>* Helios crashes a lot ...
We have tested Helios 1.1a on a 128 processor box for several days
without crashing. One copy of Helios even survived the San Fransisco
Earthquake !!!. We also have a report of Helios running for 13 days
at which the Sun being used as host crashed even though Helios kept going.
>* The editor is totally undocumented ...
Current version of Helios are sent out with a booklet about emacs.
Please contact DSL if you need a copy.
>* The memory management is full of errors ...
As the transputer does not implement memory protection there is no way
that we can protect the system from `unfriendly / buggy' programs that
corrupt memory. If any programs mess up free memory then malloc is
very likely to crash if memory has been corrupted. The lack of virtual
memory also means that it is impossible to stop programs fragmenting
memory (although giving suitable heap size to programs with objed can
reduce the problem.) All the problems sent to technical support so far
pertaining to illustrate a problem with malloc / free have been tracked
down to memory corruption in the example program !.
You can use the command map to monitor memory fragmentation.
>* The documentation of the resource maps ...
Yes the original documentation could have been better, since the original
documentation was released we have produced a number of technical notes
and guides, such as `The CDL Guide', to try and improve things. We are
currently putting a lot of effort into getting the documentation right
for Helios version 1.2 (Provisionally available NOV/DEC 90)
>* It is impossible to redirect stderr ...
Yes there is a problem here. Due to the distributed nature of Helios
all messages must be retryable, this means that reads and writes
contain not only the data and amount of data to write but also the
position to write the data this in turn means that when you use >&
from the shell to redirect sdout and stderr to a file both stream
send writes to the same file but including their position so
the data is written on top of itself. In Helios 1.2 we are thinking of
solving this problem by not allowing the redirection of stderr and sdout
together but allowing seperate redirection to different files.
If you really want to use >& at present try redirecting the output
to the /ram server e.g. make >& /00/ram/fred as this works at present.
>* The stack check of the c compiler doesn't work
We have had no problems with the stack checking could you
send us more details ?
>* the error return codes are ugly ...
Use the fault command to get a more user friendly message. If you get
the message exec format error try typing fault on its own with no
arguments to get further information.
>* the helios message passing mechanism should be documented with more
examples ...
The average user program should not need to use message passing
directly but should use the CDL and pipes for the communication
between programs. The use of pipes is NOT inefficient under Helios
see technical note 22 for more details. The only time you need to
know about the message passing in detail is if you are writing
new Helios servers.
>* sometimes user programs started by the shell script don't start ...
Without more deatils I cannot comment on the above. Version 1.1a
has fixed a few bugs to do with the CDL which meant that sometimes
programs did not run.
>* After a while one or more transputers die
We have not had any problems in Version 1.1a of this nature
due to software. We have had similar problems that have
turned out to be due to faulty hardware.
>* Helios is non-deterministic. So it is hard to reproduce errors.
You're telling us !!!. Try debugging an operating systen without memory
protection.
>* Helios boots very slowly
We are currently working on this area and anyone trying to boot very
large transputer networks should contact us.
I hope the above has answered some of your questions
----------------
Alan Cosslett
Technical Support
Perihelion Software Limited
The Maltings, Charlton Road, Shepton Mallet
Somerset, BA4 5QE, ENGLAND
Tel [+44] (0)749 344203
Fax [+44] (0)749 344977
email al...@perisl.uucp
-----------------
Distributed Software Limited (DSL)
670 Azrec West, Bristol BS12 4SD, ENGLAND
Tel [+44] (0)454 612777
Fax [+44] (0)454 618188