On 1/15/2018 3:52 AM,
jsunth...@gmail.com wrote:
> Jim, would be possible to write this kind of command?
>
> proc noop {args} {}
> proc threadit {num args} {LITERAL BLACK MAGIC NUM TIMES}
Absolutely! Yes, great question.
There is one such implementation, that I am aware of, but it is
implemented as a command embedded in a commercial tool.
The community here (me inclusive), can develop our own 'parallel_job <>'
command and post it freely to the community to use and that would
immediately be useful I think.
In a nutshell, here is what we need to do:
1. Write a procedure called 'parallel_job <cmds>' that takes a list of
commands and executes them in threads. Provide some configurability for
generally useful things.
Configuration options might be:
parallel_conf -use_tpool <0|1> 0=default
parallel_conf -tpool_min_workers <N> N=0 default
parallel_conf -tpool_max_workers <N> N=1 default
parallel_conf -initcmd {}
parallel_conf -exitcmd {}
parallel_conf -keep_alive <0|1> 0=default
parallel_conf -initcmd_env_import <0|1> 1=enabled=default; 0=disabled
This -initcmd_env_import feature automates the 'tedious' setup
of the thread. It copies what I have found to be the most
"standard/typical" things we need initialized in each thread. Stuff
like: auto_path, existing proc definitions -- users typically want to
have their threads 'just like their main' thread. We can do that for
the most part* For Gerald (we provide the 0=disable option).
parallel_conf -other options?
PARALLEL_JOBS <cmds> interface might be:
parallel_jobs {
{blackmagic -input <file1> -output <file1.out>}
{blackmagic -input <file2> -output <file2.out>}
{blackmagic -input <file3> -output <file3.out>}
...
}
parallel_jobs {
{analyze_data -set <A> -output <setA.out>}
{analyze_data -set <B> -output <setB.out>}
{analyze_data -set <C> -output <setC.out>}
...
}
parallel_jobs {
{file copy <input_fileX> <output_location>; cmd ...}
{file copy <input_fileY> <output_location>; cmd ...}
...
}
Command Description:
The parallel_jobs procedure takes a list of commands and evaluates
them individually in threads. This is a blocking procedure and returns
after all commands have been executed and threads have been closed
(automatically handled by the parallel_jobs command).
Each line is an element in a tcl list. The code will be passed to a
thread where it will be 'eval'uated by the tcl interpreter -- so any
standard tcl can be passed as the list element.
Depending on how the user configures the settings (see parallel_conf
for details) -- the threads will be done in a pool, or each element
creates a separate thread.
Note: Because threads are not initialized -- users can a) pass an
'initcmd' through the configuration interface which will execute the
'initcmd' at the startup of each thread; b) pass the initialization
sequence in the <cmd> itself; or c) eliminate any startup/initialization
dependency (the practical ability to do this depends on the particulars
of each problem).
Example demonstrating (b): passing the initialization sequence in the
<cmd> itself:
parallel_jobs {
{source init.tcl; blackmagic -input <file1> -output <file1.out>}
{source init.tcl; blackmagic -input <file2> -output <file2.out>}
{source init.tcl; blackmagic -input <file3> -output <file3.out>}
}
In this example, ./init.tcl is a routine responsible for initializing
the thread with everything that it needs to evaluate the remaining tcl
code of the <cmd>. For example update 'auto_path' so that 'blackmagic'
procedure is automatically found in the search path.
Remember threads are uninitialized (dumb) -- so we have to tell it
everything it needs to walk, crawl and run -- just like you need to do
when you start a tcl shell for the first time (this can be tedious) --
(see parallel_conf -initcmd_env_import option for a possible helper
routine we can bake to relieve some of this tedious work)...
STDOUT Considerations:
My 'blackmagic' likes to print messages, so how do you deal with that?
Threads are going to scramble everything to stdout as execution is
asynchronous and unordered (as desired).
Also, if tcl throws errors or exceptions for ANY REASON -- those are
going to get thrown to stdout in the same manner
See #1,2,3, and 7 below for more on dealing with STDOUT thread issues.
Here are a couple things to keep in mind when coding 'blackmagic':
1. print useful messages indicating what the <input file> name was at
the time the exception or error was thrown. The last thing you need is
a cryptic 'cannot open file' message and have no idea which file it was
referring to. Or depending on what blackmagic does not be able to trace
which through which file it was that caused the problem.
2. time stamps -- stdout is going to look terrible. give yourself a
fighting change and add timestamps to messages so you might be able to
relate them or stitch a series of events together. remember -- if
you're doing parallel processing, chances are you're doing something
that takes a long time -- so this can be useful despite any initial
skepticism.
3. whatever can go wrong -- will go wrong. So harden your code -- write
every check you can, check check and re-check -- for example: check
file read permissions before opening, print a message and return if not,
check write permissions, print a message and return, wrap 'open' calls
with catch and 'close' calls with catch.
4. don't wait to let tcl throw an exception, those will leave you with
cryptic messages and think that threads are unstable.
5. if you are absolutely confident that 'open' worked and 'close'
worked you are going to narrow down the issues and get to the real issue
very quickly. For example: I thought the network was stable -- I
actually assumed it was -- until it turns out -- it wasn't stable much
to my surprise. In other words: Jezz -- I thought there was no problem
with file write permissions, until there were...hmm...I'll be damned,
users do the darndest things. Well, why did you do that?
6. Parallel processing results in utilization of the compute resource
going up dramatically vs non parallel processing. You might run into
things you never saw before, ulimit issues (user has max number of open
file limits exceeded), you're chuggling along, you can open 1k files no
problem but 1k+1 files and all of a sudden you core dump and 'cannot
open file' message appears (re-read #3 above). Disk i/o will go up
(stall the network or machine, hang the system or did it run with no
problem?) I've had all 3 cases. The filesystem and network usage
should go up -- you're going to find out what the network can actually
provide, some disks have network limits, some are high performance and
mirrored or low performance, you might compete with other i/o on the
network or same disk with other jobs -- as a result you might find
unstable mounts in your network that you never even noticed before
(re-read #3 above).
7. Log messages: There is a solution to out of order log/stdout
messages but I have not written the code or thought about how to solve
it actually -- so I'm hoping somebody that has might join in and
contribute that code to this effort. I just know its possible and would
love to collaborate with somebody on solving it.
We haven't talked about:
a. tsv (thread shared variables) -- I also refer to them as safe
variables because they are also safe.
b. how to use tsv to your benefit (above) and to solve a wider range of
problems.
c. assumptions and limitations of what this style of api can do.
(what's feasible and what's not).
TSV is a very useful tool that expands this topic even farther.
But I'll pause here.
Jim