I have a DECnet object which can serve various known queries on a
database. these queries are resource consuming and I don't want more
than (say) n=5 of them of the same type to run at the same time. I
want the other one to be queued up for further processing.
I first thought of using the lock manager. this would have work with
an exclusive lock for n=1 but n>1...
I then thought of using counting-semaphores but the problem with
semaphores is that if the server process unexpectedly die before
signaling the its done with the resource, the slot it was using is
never freed.
as unfortunately the DLM does not have "counting-locks" I had the idea
of creating a null lock for every possible query type and as many
exclusive sub-lock as needed, one for every possible instance of that
particular query. for each new request of that type, the newly created
object instance enqueue one exclusive lock on every sub-lock. as soon
as one is granted, it first dequeue the unneeded ones and starts its
job.
do you see any flow in this algorithm ? do you see any other way of
doing it ?
TIA,
Pierre.
Pierre, my memory is weak but ISTR that a decnet object (at least with
NCP) uses a DCL file to make it happen. Example, if you request a
directory listing on a remote node, FAL is used and FAL.COM is the
command procedure that gets executed. If your object works the same
way you can define a logical name to control the number of instances,
or use a file to store counters.
Hans
If you can generate LMF PAKs, wouldn't a PAK with a unit count work?
Call $GRANT_LICENSE. If there are available units, then the call can
proceed. If not, try again. Also, if a process dies, the LMF does
cleanup the license count if it issued one to the process that died.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG
http://www.quirkfactory.com/popart/asskey/eqn2.png
"Well my son, life is like a beanstalk, isn't it?"
I use the DLM because if the process owning a lock dies, the lock is
automatically released.
if I understand correctly what you say, a logical or a license would
be used in the same way as a counting-semaphore: I increment the
counter when entering the "critical path" and decrement it when
leaving it but if the process dies (bug, stop/id, etc) the counter is
never decremented :(
Pierre.
Is this something that could be done with the queue manager?
Set up a special queue for the resource and set the queue job limit
(/JOB_LIMIT=5). If something is submitted, and five are executing, it
will wait. If a job completes or dies (assuming it doesn't hang), then
the next job in line gets run.
Marty
Let me prefix this by saying that I never tried this but many, many years
ago I heard of this technique from someone who was rather proud of what he
did.
What I recall him saying is that he has X number of locks. Each new process
tries to get lock 1, if it fails then it tries lock 2, if it fails then it
tries lock 3, etc. When the process gets to X then it loops around to the
first lock and walks the locks again. The problem with this is if you have 3
processes walking through the locks and one process releases the lock you
have no way to make sure that the oldest waiting process gets the just
released lock. In other words, if process A was waiting 5 minutes for a
lock, process B was waiting 3 minutes and process C was just created it is
possible that process C will be the first one to grab the lock that was just
released.
Peter Weaver
www.weaverconsulting.ca
Winner of the OpenVMS.org Readers' Choice Award for System
Management/Performance
http://www.linkedin.com/in/peterweaver
>I use the DLM because if the process owning a lock dies, the lock is
>automatically released.
Jim Duff has an excellent example of using $GETTLKI to check the
number of granted locks to impose a limit, it's at:
http://www.eight-cubed.com/examples/framework.php?file=sys_enqw.c
//CY
You might find this to be of interest:
<http://www.eight-cubed.com/examples/framework.php?file=sys_enqw.c>
Cheers,
Jim.
--
www.eight-cubed.com
Correct, externally stored resources won't work because you can't be
sure that an error condition leaves the counter mechanism intact.
Though a logical that exists only during the life of a process might
work. Like a logical defined with the /USER attribute only exists
until the next program ends.
Off the top of my head.
Have 2 DLM resource: A, B.
Acquire:
1. Get EX mode lock on A.
2. Get CR mode lock on B.
3. Use $GETLKI to get GRANTCOUNT on B.
4. If grantcount <= n then
dequeue lock A
return success.
endif
5. convert lock B to PW mode with blocking
AST.
6. Wait for blocking AST or a timeout.
7. if timeout then
dequeue B
dequeue A
return error.
endif
9. convert lock B to CR mode.
10. dequeue lock A
11. return success.
Release:
1. Convert lock B to CW mode. (signals waiter if queued.).
2. Dequeue lock B
-------------------------------------------------------------------------
David L. Jones | Phone: (614) 271-6718
Ohio State University | Internet:
140 W. 19th St. | jon...@ecr6.ohio-state.edu
Columbus, OH 43210 | vm...@osu.edu
Disclaimer: I'm looking for marbles all day long.
>Set up a special queue for the resource and set the queue job limit
>(/JOB_LIMIT=5). If something is submitted, and five are executing, it
>will wait. If a job completes or dies (assuming it doesn't hang), then
>the next job in line gets run.
Does anyone know what algorithm is used by the queue manager to implement
this (allow JOB_LIMIT jobs to run but no more, and when one completes,
select the one waiting the longest to run) ?
IIRC, in Phase IV, you can set it up as either an executable or
a command file (never tried this in Phase V).
Using that command file is very handy.
The queue manager is still a single process, one per cluster,
although it can migrate from node to node. It's trivial for a single
process to keep track of these tings itself. In the past there
have been several changes to the queue manager to keep reasonable
performance on large clusters.
Processes started by the queue manager can, and do, run on many
nodes. Which is how batch and print queues are be "on" various nodes.
"Pierre" <pierr...@gmail.com> wrote in message
news:91916ca0-728c-407a...@l31g2000vbp.googlegroups.com...
You could of course use Tier3 Client/Server which would allow you to specify
min-servers, max-servers, and idle-timeout on an application by application
basis. You can register your DECnet task via object name or number and your
server code can simultaneously be accessed by TCP/IP clients. Simply supply
Tier3 with a Shareable Image containing the 6 User Action Routines, that
Tier3 will call on your behalf during the life of a server process, and
you're away!
Below is a COBOL DECnet client example but I don't see DCL being an issue.
Just send Username/Password as the first message and the rest is up to you.
If you'd like a Hobbyist's copy of Tier3 to have a look then just tell me
whether you want Alpha or IA64.
Cheers Richard Maher
>
> TIA,
> Pierre.
****************************************************************************
********
*
*
* COPYRIGHT (c) BY TIER3 SOFTWARE LTD. ALL RIGHTS RESERVED.
*
*
*
* THIS SOFTWARE IS FURNISHED UNDER A LICENSE AND MAY BE USED AND COPIED
ONLY *
* IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF SUCH LICENSE AND WITH
THE *
* THE INCLUSION OF THE ABOVE COPYRIGHT NOTICE. THIS SOFTWARE OR ANY
OTHER *
* COPIES THEREOF MAY NOT BE PROVIDED OR OTHERWISE MADE AVAILABLE TO
ANY *
* OTHER PERSON. NO TITLE TO AND OWNERSHIP OF THE SOFTWARE IS
HEREBY *
* TRANSFERRED.
*
*
*
* THE INFORMATION IN THIS SOFTWARE IS SUBJECT TO CHANGE WITHOUT NOTICE
AND *
* SHOULD NOT BE CONSTRUED AS A COMMITMENT BY TIER3 SOFTWARE LTD.
*
*
*
****************************************************************************
********
*+
* Facilility: DEMO_CLIENT_DECNET
*
* Abstract: This is an example of a VMS client program communicating with
a remote task
* via Tier3. This program accepts a queue entry number from the
user and sends
* it to the remote Tier3 communication server for this
application (T3_DEMO).
* After Tier3 has allocated an execution server to perform the
operation the
* USER_RECV routine is called to retrieve the job entry
information and return
* the results to the client.
*
* Overview: Due to the fact that client programs communicate with Tier3
servers using
* standard DECnet non-transparent task-to-task communication,
no Tier3 specific
* software need be installed on the client node. In this
example the client
* will execute under the VMS operating system so VMS system
services have been
* used for client task development.
*
* Build example:
* $COBOL/LIST DEMO_CLIENT_DECNET
* $LINK DEMO_CLIENT_DECNET
* $SET TERM/WIDTH = 132
* $RUN DEMO_CLIENT_DECNET
*-
identification division.
program-id. demo_client_decnet.
data division.
working-storage section.
01 io$_writevblk pic s9(9) comp value external
io$_writevblk.
01 io$_readvblk pic s9(9) comp value external
io$_readvblk.
01 io$_access pic s9(9) comp value external
io$_access.
01 lib$_normal pic s9(9) comp value external
lib$_normal.
01 ss$_linkdiscon pic s9(9) comp value external
ss$_linkdiscon.
01 ss$_bufferovf pic s9(9) comp value external
ss$_bufferovf.
01 ss$_dataoverun pic s9(9) comp value external
ss$_dataoverun.
01 ss$_abort pic s9(9) comp value external
ss$_abort.
01 ss$_normal pic s9(9) comp value external
ss$_normal.
01 sys_status pic s9(9) comp.
*
01 net_chan pic s9(4) comp.
01 iosb.
03 cond_val pic s9(4) comp.
03 byte_count pic s9(4) comp.
03 chan_info pic s9(9) comp.
*+
* Tier3 currently makes no use of any optional data in the Network Connect
* Block. This may change in future versions.
*
* Before running this example you must replace the node name TIER3 with
* the appropriate DECnet node name for your site. The NCB_ID field below
* is included merely as an example of a ncb required if the DEMO server
* was advertising itself as network object number 130 rather than the
* network object name T3_DEMO.
*-
01 ncb_id pic x(13) value
'TIER3::"130="'.
01 ncb pic x(21) value
'TIER3::"TASK=T3_DEMO"'.
*+
* The Access Control Information buffer must be the first buffer transmitted
to
* the communication server. This buffer will not be passed to an execution
server
* and is the only data message buffer whose format and content will be
scrutinized
* by Tier3.
*
* Each field must contain ascii coded text and may be null terminated or
space
* filled.
*-
01 aci_buffer.
03 aci_username pic x(40).
03 aci_password pic x(40).
*+
* The Tier3 Identification buffer is returned to the client, once access has
been
* authorized. The first three bytes will contain "T3$" and the next two
bytes will
* contain the major and minor version numbers of the Tier3 software
installed at
* the target node.
*-
01 t3_id_buffer.
03 t3_id pic xxx.
03 maj_vers pic x.
03 min_vers pic x.
03 scsnode pic x(6).
03 logfails pic 9(5).
03 last_login_i.
05 yyyy pic 9(4).
05 mt pic 9(2).
05 dd pic 9(2).
05 hh pic 9(2).
05 mn pic 9(2).
05 ss pic 9(2).
05 cc pic 9(2).
03 last_login_n.
05 yyyy pic 9(4).
05 mt pic 9(2).
05 dd pic 9(2).
05 hh pic 9(2).
05 mn pic 9(2).
05 ss pic 9(2).
05 cc pic 9(2).
*
01 time_vector_binary.
03 yyyy pic s9(4) comp.
03 mt pic s9(4) comp.
03 dd pic s9(4) comp.
03 hh pic s9(4) comp.
03 mn pic s9(4) comp.
03 ss pic s9(4) comp.
03 cc pic s9(4) comp.
01 vms_binary_time pic s9(18) comp.
01 vms_ascii_date pic x(17).
01 day_names_table.
03 pic x(9) value
"Monday".
03 pic 9(4) comp value 6.
03 pic x(9) value
"Tuesday".
03 pic 9(4) comp value 7.
03 pic x(9) value
"Wednesday".
03 pic 9(4) comp value 9.
03 pic x(9) value
"Thursday".
03 pic 9(4) comp value 8.
03 pic x(9) value
"Friday".
03 pic 9(4) comp value 6.
03 pic x(9) value
"Saturday".
03 pic 9(4) comp value 8.
03 pic x(9) value
"Sunday".
03 pic 9(4) comp value 6.
01 day_names_array redefines
day_names_table.
03 day_of_week occurs 7.
05 day_name pic x(9).
05 day_name_len pic 9(4) comp.
01 day_number pic 9(9) comp.
01 welcome_string pic x(65).
01 welcome_string_len pic 9(4) comp.
01 last_login_msg pic x(62).
01 logfails_msg.
03 out_logfails pic z(8)9.
03 pic x(37) value "
failures since last successful login".
*+
* The following buffers are application specific. In this DEMO example a
maximum
* buffer size of 510 bytes has been selected, and the first two bytes of
every
* message have been reserved for message identification. Note that this is
purely
* an application convention that needs to be observed by the programmers of
the
* client and server components of DEMO and is not a requirement of Tier3.
*-
01 reply_buffer.
03 msg_type pic xx.
88 valid_reply values "00",
"11", "99".
88 error_msg value "00".
88 entry_data value "11".
88 end_of_file value "99".
03 pic x(510).
*
01 error_buffer redefines
reply_buffer.
03 pic xx.
03 error_msg_len pic 999.
03 error_msg_text pic x(507).
*
01 entry_info_buffer redefines
reply_buffer.
03 pic xx.
03 entry_number pic z(9)9.
03 job_name pic x(39).
03 job_status pic x(15).
03 que_name pic x(31).
03 que_type pic x(10).
03 que_status pic x(10).
*
01 get_entry_buffer.
03 pic xx value "10".
03 user_entry pic 9(10).
03 max_entries pic 9(5) value 12.
*+
* Application specific working-storage.
*-
01 screen_line_table.
03 screen_line occurs 12.
05 entry_number pic z(9)9.
05 job_name pic x(39).
05 job_status pic x(15).
05 que_name pic x(31).
05 que_type pic x(10).
05 que_status pic x(10).
01 screen_header.
03 pic x(13) value "
Entry#".
03 pic x(40) value "Job
Name".
03 pic x(16) value "Job
Status".
03 pic x(32) value "Queue
Name".
03 pic x(13) value "Queue
Type".
03 pic x(18) value "Queue
Status".
01 out_entry.
03 entry_number pic z(9)9.
03 pic xxx.
03 job_name pic x(39).
03 pic x.
03 job_status pic x(15).
03 pic x.
03 que_name pic x(31).
03 pic x.
03 que_type pic x(10).
03 pic xxx.
03 que_status pic x(10).
*
01 user_exit pic x value "N".
01 line_count pic 9(9) comp.
01 screen_index pic 9(9) comp.
01 end_key pic x(4).
*
procedure division.
kick_off section.
00.
perform socket_and_connect.
if sys_status not = ss$_normal go to fini.
perform application_logon.
if t3_id not = "T3$"
move ss$_abort to sys_status
go to fini.
perform display_welcome.
perform get_entry_info until user_exit = "Y" or sys_status not =
ss$_normal.
if sys_status not = ss$_normal go to fini.
perform socket_close.
*
fini.
call "sys$exit" using by value sys_status.
*
get_entry_info section.
00.
display "Enter job entry number (zero = wild, ctrl/z = exit): "
line 1 column 1 erase screen no advancing.
accept user_entry with conversion
reversed
bold
protected
default is zero
at end move "Y" to user_exit
go to fini.
*+
* Call my USER_RECV routine.
*-
call "sys$qiow"
using by value 0, net_chan, io$_writevblk
by reference iosb
by value 0, 0
by reference get_entry_buffer
by value 17, 0, 0, 0, 0
giving sys_status.
if sys_status = ss$_normal move cond_val to sys_status.
if sys_status not = ss$_normal go to fini.
move zeros to line_count.
perform load_entries with test after
until not entry_data or sys_status not = ss$_normal.
if sys_status not = ss$_normal go to fini.
if error_msg
display "Error retrieving job entry information,"
display error_msg_text(1:error_msg_len)
else display screen_header at line 1 column 1 reversed erase screen
perform varying screen_index from 1 by 1 until screen_index >
line_count
move corr screen_line (screen_index) to out_entry
display out_entry
end-perform.
display "Press RETURN to continue." no advancing.
accept key in end_key at end continue.
*
fini.
*
socket_and_connect section.
00.
*+
* Allocate a network device. This application will not respond to any
interrupt or network
* protocol messages, therefore no mailbox parameter has been specified. If
your application
* requires a mailbox it is more convienient to use the Run-Time Library
routine LIB$ASN_WTH_MBX.
*-
call "sys$assign" using by descriptor "_NET:"
by reference net_chan
by value 0, 0, 0
giving sys_status.
if sys_status not = ss$_normal go to fini.
*+
* Request a logical link connection to the Tier3 communication server for
this application.
*
* If ss$_reject is returned then the reason for the rejection will be
recorded in the
* the communication server's log file. The most common reason is that the
maximum number
* of links, specified in the Tier3 configuration file, that the target node
will accept for
* this application has been reached. If this is the case, then the parameter
value should be
* increased or the client redirected to another node.
*
* Note: As the NCB and its descriptor must be in read/write storage,
literals can not be used.
*-
call "sys$qiow"
using by value 0, net_chan, io$_access
by reference iosb
by value 0,0,0
by descriptor ncb
by value 0,0,0,0
giving sys_status.
if sys_status = ss$_normal move cond_val to sys_status.
*
fini.
*
application_logon section.
00.
*+
* Once a logical link has been established, the communication server will be
* expecting access control information so that it can authorize client
access
* to the application.
*
* The aci_buffer allocates 40 bytes each for the username and password
fields but
* as VMS is currently the only supported server platform for Tier3 you
should limit
* usernames to 12 bytes and passwords to 32.
*-
display "Username: " erase screen no advancing.
accept aci_username protected size 12 editing at end go to fini.
display "Password: " no advancing.
accept aci_password protected size 32 editing no echo at end go to fini.
call "sys$qiow"
using by value 0, net_chan, io$_writevblk
by reference iosb
by value 0, 0
by reference aci_buffer
by value 80, 0, 0, 0, 0
giving sys_status.
if sys_status = ss$_normal move cond_val to sys_status.
if sys_status not = ss$_normal go to fini.
*+
* To complete the hand-shaking sequence the communication server will reply
* to a successfull access attempt with the Tier3 identification buffer.
*
* If the access control information was invalid the communication server
* will break the connection and the following read will return
ss$_linkdiscon.
* Before retrying a failed access attempt you must therefore re-connect
* (io$_access) the communication server. The communication server's log
* file will contain detailed information describing any authorization
* failures.
*
* The Tier3 Identification buffer for handshake 1 is 48 bytes long.
*-
call "sys$qiow"
using by value 0, net_chan, io$_readvblk
by reference iosb
by value 0, 0
by reference t3_id_buffer
by value 48, 0, 0, 0, 0
giving sys_status.
if sys_status = ss$_normal move cond_val to sys_status.
if sys_status = ss$_linkdiscon
display "User authorization failure"
move ss$_normal to sys_status.
*
fini.
*
load_entries section.
00.
*+
* As terminal i/o can take an indefinite amount of time to complete, once
* entry information is retrieved from the remote node it is deferred to
* working-storage rather than being sent directly to the screen. This
* strategy removes a potential cause of buffer starvation in the
communication
* server. If it is impractical to set a limit on the amount of information
to
* be returned from the execution server then the developer should consider
* deferring output to a temporary file, or expanding available memory via
* a routine such as LIB$GET_VM.
*
* Furthermore, if it is a requirement of your application that the
association
* between client and execution server persist during terminal i/o it may be
* necessary to modify the application's parameter record in the Tier3
* configuration file so that the "maximum servers" and "maximum links"
* parameters are set to the same value. This configuration would effectively
* dedicate a seperate execution server to each client and avoid potential
* delays in the servicing of client requests.
*-
call "sys$qiow"
using by value 0, net_chan, io$_readvblk
by reference iosb
by value 0, 0
by reference reply_buffer
by value 512, 0, 0, 0, 0
giving sys_status.
if sys_status = ss$_normal move cond_val to sys_status.
if entry_data and sys_status = ss$_normal
add 1 to line_count
move corr entry_info_buffer to screen_line (line_count).
*
socket_close section.
00.
*+
* Break logical connection and dassign the network channel.
*
* The communication server will automatically deallocate any resources
* maintained on behalf of the client and will call, at AST level, the
* interrupt routine specified in the Tier3 configuration file if the
* client is currently associated with an execution server.
*-
call "sys$dassgn" using by value net_chan giving sys_status.
*
display_welcome section.
00.
call "sys$fao"
using by descriptor "Welcome to the DEMO application via TIER3
V!@UB.!@UB on node !AS"
by reference welcome_string_len
by descriptor welcome_string
by reference maj_vers, min_vers
by descriptor scsnode
giving sys_status.
if sys_status not = ss$_normal call "lib$stop" using by value
sys_status.
display welcome_string (1:welcome_string_len).
move corr last_login_i to time_vector_binary.
perform cvt_lastlogin.
string " Last interactive login on ",
day_name (day_number) (1:day_name_len(day_number)),
" ", vms_ascii_date
delimited by size
into last_login_msg.
display last_login_msg.
move corr last_login_n to time_vector_binary.
perform cvt_lastlogin.
string " Last non-interactive login on ",
day_name (day_number) (1:day_name_len(day_number)),
" ", vms_ascii_date
delimited by size
into last_login_msg.
display last_login_msg.
if logfails > zeros
if logfails = 1
display " 1 failure since last successful login"
else
move logfails to out_logfails
display logfails_msg.
display "Press RETURN to continue." no advancing.
accept key in end_key at end continue.
go to fini.
*
cvt_lastlogin.
*
call "lib$cvt_vectim" using time_vector_binary, vms_binary_time giving
sys_status.
if sys_status not = lib$_normal call "lib$stop" using by value
sys_status.
call "sys$asctim"
using by value 0
by descriptor vms_ascii_date
by reference vms_binary_time
by value 0
giving sys_status.
if sys_status not = ss$_bufferovf call "lib$stop" using by value
sys_status.
call "lib$day_of_week" using vms_binary_time, day_number giving
sys_status.
if sys_status not = ss$_normal call "lib$stop" using by value
sys_status.
*
fini.
*
end program demo_client_decnet.
I think I have an algorithm that needs fleshing out. You have a master
lock M as well as other locks, numbered 1 through n-1 (n=5 in this case).
A process tries to get the master lock in exclusive mode, waiting until it
can. If it's waiting it normally means there are n processes running
already, but can also be transient. If it gets the lock, it counts the
number of holders. If it's less than n, it converts the master lock to
null mode, and gets an exclusive lock on one of the numbered locks.
(this part needs work, but for now, assume it tries Lock 1 first, if it
can't get it immediately, it tries Lock 2, then lock 3 etc), then converts
the master lock to Null, allowing another process to get it. If the
number of holders on the master lock = n, it does _not_ convert the lock
(thus blocking the next process) but instead requests access to all the
numbered locks. As long as the other processes hold these locks, the
process holding the master lock won't get any of them. Once any of these
other processes exits, releasing one of the numbered locks, the process
holding the master lock in EX has that numbered lock, and converts the
master lock to NULL, allowing another process to run.
> Correct, externally stored resources won't work because you can't be
> sure that an error condition leaves the counter mechanism intact.
> Though a logical that exists only during the life of a process might
> work. Like a logical defined with the /USER attribute only exists
> until the next program ends.
yes. but multiple servers will run in separated processes and (from
the top of my head) wont have access to other process' user mode
logicals.
> I think I have an algorithm that needs fleshing out. You have a master
> lock M as well as other locks, numbered 1 through n-1 (n=5 in this case).
> A process tries to get the master lock in exclusive mode, waiting until it
> can. If it's waiting it normally means there are n processes running
> already, but can also be transient. If it gets the lock, it counts the
> number of holders. If it's less than n, it converts the master lock to
> null mode, and gets an exclusive lock on one of the numbered locks.
> (this part needs work, but for now, assume it tries Lock 1 first, if it
> can't get it immediately, it tries Lock 2, then lock 3 etc), then converts
> the master lock to Null, allowing another process to get it. If the
> number of holders on the master lock = n, it does _not_ convert the lock
> (thus blocking the next process) but instead requests access to all the
> numbered locks. As long as the other processes hold these locks, the
> process holding the master lock won't get any of them. Once any of these
> other processes exits, releasing one of the numbered locks, the process
> holding the master lock in EX has that numbered lock, and converts the
> master lock to NULL, allowing another process to run.
your algorithm resemble mine but is easier to implement :) keeping the
EX lock on the master lock make the life easier than converting it to
NL after enqueing EX lock on numbered ones. and as other processes
wanting to access the "multi-resource" are waiting behind the master
lock and not behind the numbered locks, the DLM has less job to do.
Pierre.
> Off the top of my head.
>
> Have 2 DLM resource: A, B.
>
> Acquire:
> 1. Get EX mode lock on A.
> 2. Get CR mode lock on B.
> 3. Use $GETLKI to get GRANTCOUNT on B.
> 4. If grantcount <= n then
> dequeue lock A
> return success.
> endif
> 5. convert lock B to PW mode with blocking
> AST.
> 6. Wait for blocking AST or a timeout.
> 7. if timeout then
> dequeue B
> dequeue A
> return error.
> endif
> 9. convert lock B to CR mode.
> 10. dequeue lock A
> 11. return success.
>
> Release:
> 1. Convert lock B to CW mode. (signals waiter if queued.).
> 2. Dequeue lock B
this looks interesting but if the process holding lock B dies before
converting it to CW, the process waiting for B may wrongly exit on the
timeout (or wait forever if no timeout is used). the same may happen
if (unfortunately) a process holding a lock on B exits between
counting and the conversion of lock on B to PW.
First of all, a process dieing while holding a lock should be a rare exception
and the cause investigated and fixed. You don't get a wait condition until
multiple (e.g. 5) processes hold the lock, If a lock holder dies, the wait
will still be satisfied by the next process that releases the lock. If someone
kills the process prior to converting lock B to PW, lock A goes also so the
next process needing the lock isn't blocked from seeing the correct current
grant count.
-----------------------------------------------------------------------------
first, I absolutely agree that one *must* investigate the cause(s)
making a process expectingly die but unfortunately this sometimes
happens. the process may also die because of a stop/id...
anyway, by
> > [...] the same may happen
> > if (unfortunately) a process holding a lock on B exits between
> > counting and the conversion of lock on B to PW.
I don't mean "if the requesting process dies" but "if one of the
(e.g.) 5 processes already legitimately holding the resource dies"
between test (n<=5) in step 4 and step 5. as those processes dequeued
lock on A in their own step 4, they can no longer auto-release it
while dying.
> If a lock holder dies, the wait
> will still be satisfied by the next process that releases the lock.
yes. but this may take awhile, maybe longer than the requested
timeout. and if I create a library, I want the same procedure to work
for any n, even n=1, in which case there will not be any next process
to release their lock :(
I use the master lock to block processes beyond n, it is converted to NL
if and only if at least one additional process is allowed to run.
Otherwise, keeping it at EX keeps additional processes from running.
A lot of discussion about how to implement a queuing mechanism using
distributed locks ... but perhaps you don't need to do it that way at all.
It sounds like your DECnet objects use transparent task-to-task
communication: DECnet creates a NET$SERVER process for each incoming
connection, and the process runs your database query. This is simple to set
up and simple to program, but gives you very little control.
You might be better off by restructuring your application so that the server
uses "non-transparent" communication. In this mode, the server process
starts up before any client requests have begun. The server declares itself
to DECnet as a network object, and then waits for incoming connections.
Under this model your server can queue the work to be done and assign it to
any number of worker processes, all under the control of the central server.
Have a look at the "DECnet for OpenVMS Networking Manual" which you can
download from the HP OpenVMS web site
http://h71000.www7.hp.com/doc/73final/documentation/pdf/DECNET_OVMS_NET_MAN.PDF
Regards,
Jeremy Begg
"Jeremy Begg" <jeremy.r...@vsm.com.au> wrote in message
news:4AE666CD...@vsm.com.au...
> Hi Pierre,
>
> A lot of discussion about how to implement a queuing mechanism using
> distributed locks ... but perhaps you don't need to do it that way at all.
>
> It sounds like your DECnet objects use transparent task-to-task
> communication: DECnet creates a NET$SERVER process for each incoming
> connection, and the process runs your database query. This is simple to
set
> up and simple to program, but gives you very little control.
>
> You might be better off by restructuring your application so that the
server
> uses "non-transparent" communication. In this mode, the server process
> starts up before any client requests have begun. The server declares
itself
> to DECnet as a network object, and then waits for incoming connections.
> Under this model your server can queue the work to be done and assign it
to
> any number of worker processes, all under the control of the central
server.
>
> Have a look at the "DECnet for OpenVMS Networking Manual" which you can
> download from the HP OpenVMS web site
>
http://h71000.www7.hp.com/doc/73final/documentation/pdf/DECNET_OVMS_NET_MAN.PDF
>
What a great Idea!