Sometimes we have the problem that an TNA device
has an "owner" listed that doesn't exists. Like in
the case below where the listed process 00003996
doesn't exist on the system. This prevents us
to do a TELENT /DELETE on the port.
Is there a way to clear the "Owner proc ID" field ?
Jan-Erik.
$ sh dev TNA7104/fu
Terminal TNA7104:, device type LA120, is online,
record-oriented device, carriage control, device is busy.
Error count 0 Operations completed 97
Owner process "" Owner UIC [MK]
Owner process ID 00003996 Dev Prot S:RWPL,O:RWPL,G,W
Reference count 4 Default buffer size 132
$ pipe sh sys | sea sys$pipe 3996
%SEARCH-I-NOMATCHES, no strings matched
$ sh proc/id=00003996
%SYSTEM-W-NONEXPR, nonexistent process
It's not the owner that's your problem. You have channels assigned to
that device. Fine out what process(es) have interest in that device.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG
All your spirit rack abuses, come to haunt you back by day.
All your Byzantine excuses, given time, given you away.
How ?
"SDA> SET PROC/ID=00003996" doesn't work either, as expected I guess...
I don't find any "SET DEVICE..." variant in SDA.
OK, found "SHO DEVICE" in SDA...
That gives me some info. Is it something about that that
is intersting ?
SDA> SHOW DEV LTA7104: shows :
Owner UIC [000330,000030]
PID 000E0196
SDA> SHO SUM shows no process with PID=000E0196
Found this tolll that I will evaluate :
http://vms.process.com/scripts/fileserv/fileserv.com?CLRREF
Any thoughts/comments about that one ?
Jan-Erik, before you consider that, please find out which process
has a channel open to TNA7104: (p.s. you start to mention LTA7104:
why ?)
$ analyse/system
SDA> set out x.x
SDA> show process all/channel
SDA> exit
Then search X.X for TNA7104: and work back from that.
There's a reason why the device has a reference count. Figure out why and
you can probably avoid this problem again in your future. Try using SDA.
SDA> SET OUTPUT/SINGLE channels.tmp
SDA> SHOW PROCESS ALL /CHANNEL
SDA> SPAWN SEARCH channels.tmp TNA7104
If this returns information, there there are channels assigned to your TNA
device. You can then try something like...
SDA> SHOW PROCESS ALL /CHANNEL/PCB/IMAGE
Search the output for the device name, find the process and image running
that has interest in your device.
OK, thanks Roy and VAXman. I found it.
It was an old process/workplace startup file (that should not
be used today) that still allocated the TNA device.
Now, the questions still is why one process have a channel
against the TNA device while another process shows in
SHOW DEVICE TNA7104.
Is there a simple way to get the "right" owner (the process
actualy having the channel open).
Right now i use f$getdvi(portname,"PID") to get the owner
and then tries to stop/id then process. This fails since
the wrong process is listed in the device info.
Stopping the process that SDA gave me clears the owner
field in SHO DEV. But note that it still isn't the
same process ID.
$ sh dev tna7104/fu
Terminal TNA7104:, device type LA120, is online,
record-oriented device, carriage control, device is busy.
Error count 0 Operations completed 109
Owner process "" Owner UIC [MK]
Owner process ID 00003996 Dev Prot S:RWPL,O:RWPL,G,W
Reference count 3 Default buffer size 132
$ stop/id=000035F8
$ sh dev tna7104/fu
Terminal TNA7104:, device type LA120, is offline,
device set /NOAVAILABLE, record-oriented device,
carriage control.
Error count 0 Operations completed 110
Owner process "" Owner UIC [MK]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G,W
Reference count 0 Default buffer size 132
So what I need is a method to get the correct process ID
from DCL to be able to through the process out. Some code
to parse the SDA output would work, of course.
Anyway, thanks ! :-)
Jan-Erik.
[The following is about 50% guesswork, but it's also about the
only rational way that things could possibly be given what I do
now for a 100% certainty]
Operating system kernels like to operate "lean and mean".
The first process to open a channel or allocate a device populates
the owner field. Any subsequent process that opens a channel
to the [shareable] device merely increments its reference count.
That's a simple algorithm:
increment reference count
if reference count had been zero, populate the owner field.
The last process to close a channel or de-allocate a device
clears the fields (and possibly triggers device deletion).
That's also a simple algorithm:
decrement reference count
if reference count is now zero, zap the device.
Your request would involve checking each time the reference
count is decremented to make sure that another reference
from the same owner exists. If not, you would have the
kernel randomly select another owner and re-populate the
owner field that way. This would have the potential side effect
of changing UIC-based security access controls on the device.
It is not clear that the data structures to optimize such a search
even _exist_ (although the SDA approach is an existence proof
that such a search is theoretically possible). But such a search
is definitely something that you do not want to be doing in the
middle of the time-critical kernel code.
> Right now i use f$getdvi(portname,"PID") to get the owner
> and then tries to stop/id then process. This fails since
> the wrong process is listed in the device info.
If you have processes that hang while holding crucial resources
the solution is to get those processes to release the resources
before they hang or get them not to hang at all.
Lining them up and spraying them with bullets is a crude
workaround. It is not the responsibility of the O/S to assist
in crude workarounds.
In my opinion.
OK. I understand what you write. :-)
The only thing I want/need is to have the TNA device go away.
Now, I guess it isn't a major problem, the fact that *two*
processes opened this device was an error. The device as
such connects to a hand-held barcode scanner and there
should only be one process talking to each scanner...
Jan-Erik.
I'm wondering if the process with the channel on the device might
by a subprocess of the one that owns it.
>OK, thanks Roy and VAXman. I found it.
>It was an old process/workplace startup file (that should not
>be used today) that still allocated the TNA device.
>Now, the questions still is why one process have a channel
>against the TNA device while another process shows in
>SHOW DEVICE TNA7104.
>Is there a simple way to get the "right" owner (the process
>actualy having the channel open).
This is easy to reproduce.
1) Log onto the VMS system using some sort of virtual terminal
(TELNET, LAT etc. It's kinda hard these days to _not_ do this)
Suppose the session gets terminal TNA100: and PID 00001234.
2) From another process (with SHARE priv.), assign a channel to this
terminal, such as doing:
$ OPEN/READ X TNA100:
3) Do a SHOW DEV/FULL TNA100:
You'll see a normal owner process, PID, etc. plus a reference count of 3.
4) Log out of the process logged into in step 1. You may notice that the
TELNET/LAT session does *not* get disconnected , but there'll be no
response if you try to type into this terminal session.
5) Do a SHOW DEV/FULL TNA100:
You'll now see a NULL owner process, but the PID is what it was before.
Any attempts to reference that PID (SHOW PROCESS/ID, STOP/ID etc.) will
result in a "%SYSTEM-W-NONEXPR, nonexistent process" message. The
reference count is 1. It is the channel created by the $ OPEN command.
6) From the second terminal window, type $ CLOSE X. You'll see the telnet
session for TNA100: finally get disconnected and TNA100: "goes away" (any
attempt to do something like SHOW DEV TNA100: results in a
"%SYSTEM-W-NOSUCHDEV, no such device available" message
You may wish to argue that the PID should change in 5) to be that of
the process still holding a channel, but that certainly has security
issues, plus what should you do if two (or more) other processes have a
channel to the device?
This is a pecularity of VMS since pretty much forever.
>[The following is about 50% guesswork, but it's also about the
>only rational way that things could possibly be given what I do
>now for a 100% certainty]
>Operating system kernels like to operate "lean and mean".
>The first process to open a channel or allocate a device populates
>the owner field. Any subsequent process that opens a channel
>to the [shareable] device merely increments its reference count.
>That's a simple algorithm:
> increment reference count
> if reference count had been zero, populate the owner field.
VMS has a concept of "template and clone devices" for virtual devices
such as telnet and LAT sessions. There is a template device, usually
XXX0:, and the other XXXn: devices are clones. If you try to assign
a channel to the template XXX0: you don't actually get a channel to
XXX0:. A clone gets created and you get a channel assigned to it.
Usually the first thing you have to do is some I/O that associates the
clone with...something. For example if you want to create an IP link
you really assign a channel to BG0:, get some BGn: assigned, then issue
a request to make it a TCP or UDP link, to connect to some remote address
and port or listen on a port, or whatever.
For a TELNET login, some different magic happens, where a clone gets
created behind the scenes and it gets assigned to the newly created
process trying to log in.
Anyway, the clone gets the creating process PID assigned to it as owner,
as well as the security fields that of the creator.
>The last process to close a channel or de-allocate a device
>clears the fields (and possibly triggers device deletion).
>That's also a simple algorithm:
> decrement reference count
> if reference count is now zero, zap the device.
For clone devices, if the reference count goes to zero, the clone goes
away (ceases to exist, its UCB is deleted). "Real" devices (say, OPA0:)
don't "go away" but there is other cleanup. If it doesn't go to zero,
nothing special happens, even if the PID in the owner field ceases to
exist.
Interestingly, mailbox devices don't use the template/clone algorithm.
Instead, there's a special system service ($CREMBX) to create one. It's
probably because mailboxes existed since the very beginnings of VMS,
before they came up and implemented the template/clone algorithm, plus
the fact a few mailboxes are practically welded within the kernel.
Yes, I'm fully understand when and why this happens.
Still, I do not see any easy way to find out that second
process that holds a channel to the TNA device (and preventing
the TELNET /DELETE command to finish). Besides of parsing
the SDA output.
And in our case the TNA device is created by a TELENT /CREATE
command, so the device stays until the /DELETE is done anyway.
Yes. Not all DEC utilities (e.g. RMS) were prepared to deal with the
concept of opening a channel to one device and obtaining a channel
to another.
That was some fun years ago. Details fade, but I think this was in
the context of a DECWindows login where you'd have
SYS$INPUT as a PPF logical name pointing to one cloned device
(e.g. WSA9:). You'd do a "show sys$input" and a "show device"
and that WSA9 no longer existed. But if you decoded the channel
encoded inthe PPF name, you'd see that it was to WSA10:
The channel encoded in a PPF name trumps the equivalence
name that appears in plaintext.
> OK, thanks Roy and VAXman. I found it.
>
> It was an old process/workplace startup file (that should not
> be used today) that still allocated the TNA device.
If you had been running windows, the solution would have been far
simpler, and require much less of your time: ALT-CTRL-DEL :-)