How to find what's locking a file

Jonathan Ball

unread,

Sep 29, 2006, 11:12:28 PM9/29/06

to

We have a process that receives inbound data from a non System i server
and processes it. I'm not very familiar with the process, but it
copies the inbound data to a member in a multi-member file, then
immediately tries to copy that member somewhere else. This process
runs hundreds of times a day, and has been running since some time in
July, so there are thousands of members in the multi-member files
(there are several such.)

When the process tries to copy from the member it has just added, it
frequently gets a "file in use" diagnostic message (CPF3202), followed
by "file not copied because of error" (CPF2814), followed by "Copy
command ended because of error" (CPF2817); the process is using CPYF.
When I attempt to determine what is locking the from-file using
WRKOBJLCK, the command doesn't show the results until there is no
longer a contending lock. I suspect this is because there are so many
members in the file.

We thought at first it was because Mimix was trying to do object
synchronization on the file - that has been the cause of file-in-use
problems before - but we have taken the files completely out of Mimix,
and we're still getting the errors.

How can I find what has the file in use? I started to look in the
Information Center about the various performance monitoring options,
but that looks daunting, and I didn't see anything that looked like it
would yield the answer quickly.

Thanks in advance.

Saml

unread,

Sep 30, 2006, 10:18:51 AM9/30/06

to

You might consider ALCOBJ *EXCL on the member in question before the CPYF
with a longish wait time, say several minutes, and see if that makes any
difference.

The underlying problem might be something to do with the large numbers of
members and some kind of purge might be in order...

Sam

"Jonathan Ball" <notg...@yahoo.com> wrote in message
news:1159585948....@i3g2000cwc.googlegroups.com...

Steve Richter

unread,

Sep 30, 2006, 10:35:10 AM9/30/06

to

Jonathan Ball wrote:
> We have a process that receives inbound data from a non System i server
> and processes it. I'm not very familiar with the process, but it
> copies the inbound data to a member in a multi-member file, then
> immediately tries to copy that member somewhere else. This process
> runs hundreds of times a day, and has been running since some time in
> July, so there are thousands of members in the multi-member files
> (there are several such.)
>
> When the process tries to copy from the member it has just added, it
> frequently gets a "file in use" diagnostic message (CPF3202), followed
> by "file not copied because of error" (CPF2814), followed by "Copy
> command ended because of error" (CPF2817); the process is using CPYF.
> When I attempt to determine what is locking the from-file using
> WRKOBJLCK, the command doesn't show the results until there is no
> longer a contending lock. I suspect this is because there are so many
> members in the file.

Use the MBR parm on WRKOBJLCK command. You could ALCOBJ on the member
of the file before the CPYF with WAIT(0) . Then MONMSG CPF0000
exec(WRKOBJLCK obj(fileName) objtype(*file) MBR(member-name)
output(*print)). Or do the MONMSG after the CPYF.

also, if the member is empty when the CPYF runs, or you can tell the
difference between the new records and what is in the member before the
CPYF, you could use CPYF MBROPT(*ADD). MBROPT(*ADD) does not need an
exclusive type lock on the tofile.

-Steve

Jonathan Ball

unread,

Sep 30, 2006, 4:44:58 PM9/30/06

to

Steve Richter wrote:
> Jonathan Ball wrote:
>
>>We have a process that receives inbound data from a non System i server
>>and processes it. I'm not very familiar with the process, but it
>>copies the inbound data to a member in a multi-member file, then
>>immediately tries to copy that member somewhere else. This process
>>runs hundreds of times a day, and has been running since some time in
>>July, so there are thousands of members in the multi-member files
>>(there are several such.)
>>
>>When the process tries to copy from the member it has just added, it
>>frequently gets a "file in use" diagnostic message (CPF3202), followed
>>by "file not copied because of error" (CPF2814), followed by "Copy
>>command ended because of error" (CPF2817); the process is using CPYF.
>>When I attempt to determine what is locking the from-file using
>>WRKOBJLCK, the command doesn't show the results until there is no
>>longer a contending lock. I suspect this is because there are so many
>>members in the file.
>
>
> Use the MBR parm on WRKOBJLCK command.

That's a good tip. I had never prompted the WRKOBJLCK
command (or if I ever did, it was many years ago), so I
didn't know if could be restricted to just the member
of interest. Thanks.

> You could ALCOBJ on the member
> of the file before the CPYF with WAIT(0) . Then MONMSG CPF0000
> exec(WRKOBJLCK obj(fileName) objtype(*file) MBR(member-name)
> output(*print)). Or do the MONMSG after the CPYF.

The developers know about those, but right now I'm
trying to investigate what's causing the lock.

Thanks again for the above tip.

jse...@yahoo.co.nz

unread,

Oct 1, 2006, 4:14:57 PM10/1/06

to

Jonathan Ball wrote:
> We have a process that receives inbound data from a non System i server
> and processes it. I'm not very familiar with the process, but it
> copies the inbound data to a member in a multi-member file, then
> immediately tries to copy that member somewhere else. This process
> runs hundreds of times a day, and has been running since some time in
> July, so there are thousands of members in the multi-member files
> (there are several such.)

Not an answer to your question, but are you aware that there is a limit
to the number of members a file can have? If this runs hundreds of
times a day, then you will reach the limit (32767 members) in less than
a year.

manas

unread,

Oct 1, 2006, 10:37:58 PM10/1/06

to

Hi Jon,
you could try analyzing journal results if the file is journaled, to
look for activity against the file. I would also do the wrkdgacte in
mimix to confirm that there is no mimix activity.
manas

Dr.UgoGagliardelli

unread,

Oct 2, 2006, 2:29:31 AM10/2/06

to

Put the file under journal, including open/close, when you get the error
you can see who was the last job that made at least an open. Maybe
you'll find that some job opened the file without doing nothing a long
time before.
--
Dr.Ugo Gagliardelli,Modena,ItalyCertifiedUindoscrasherAñejoAlcoolInside
Spaccamaroni andate a cagare/Spammers not welcome/Spammers vão à merda
Spamers iros a la mierda/Spamers allez vous faire foutre/Spammers loop
schijten/Spammers macht Euch vom Acker/Spamerzy wypierdalac'

walker.l2

unread,

Oct 2, 2006, 5:19:28 AM10/2/06

to

We had a problem a while back where DSPOBJLCK reported no locks, but
any attempt to use the object failed with an 'Object in use' message.
IBM gave us the following instructions (I think for v5r1, but we have
successfully used them under v5r2 as well) for tracking down object
locks when this happens:

How to find who has a lock on an object:
Sometimes, DSPOBJLCK might not show you what is locking an object,
either that or the object
in question is an IFS file object.
Here are the steps required to find what is locking an object.
First, determine if the file is checked out, especially if the person
reporting the problem has stated
that the file remains locked across IPLs. Simply use the DSPLNK command
for the object and
take option 8. If you do not see any checkout fields, the file is not
checked out. If you see
checkout information but the user profile name is unreadable or simply
blank, you might want to
talk with IBM Support Line, since we have seen some intermittent
directory glitches with this
symptom. To free the file, use the CHKIN command.

If the file is not checked out, the process becomes a bit more
involved. Start by using our dumper
to find the system pointer to the object. For example, if you are
examining /mydir/mysubdir/myfile,
use the following command:
CALL QP0FPTOS '/mydir/mysubdir/myfile'
To Check non IFS files, is the command:
CALL QP0FPTOS '/QSYS.LIB/YOURLIB.LIB/YOURFILE.FILE/YOURMBR.MBR'
(That is, "QP-zero-FPT-oh-S", this is the Perform Miscellaneous File
System Functions API.)
Make sure you press F10 to include detailed messages.
The command produces a message like the following:
Specified path refers to a system object at address
000000000000000014B5FCF424001E00
The critical portion of the address on a RISC system is underlined
above. Its twelve characters
starting under the second "e" in "refers" and ending under the second
"s" in "system". Copy those twelve characters.

Now, you need to call the dumper with a different parameter: CALL
QP0FPTOS *DUMPALL
Note: This may take a considerable time to complete, and can produce a
very big spool file.
This will produce a spool file containing information about vnodes and
open instances. Search the
spool file for the system pointer string you saved previously, but you
need to add a space
between the 10th and 11th characters: Find . . . . . . 14B5FCF424 00
When you perform the search (F16), you should end up positioned to a
line like this:
h_tnode 0000000100 0072C0 h_c 14B5FCF424 001E00 h_next
F2D5CE48B6 020E10 h_prev *NULL

You are now looking at part of the vnode information for the file. The
vnode address should be
about seven lines backward from this line, right above a line starting
with "v_lock", and
look something like:
F1799388DD 008000
Copy this information. About three lines below the address is the
"v_usecount" field.
Take note of the number there.
Now, scroll forward until you see the line containing only "Lock
flags". The three lines
following that show locking information for the object. Take note of
the information there.
To determine whether the file is actually open, we need to search the
spool file for the vnode
address until we find a match on a line containing "f_object". If no
such match is
found, the file was not open at the time of the dump. If a match is
found, search backward in the
spool file for a line starting with "Process". This line will contain
the job name which
has the file open. For instance, the following line:
Process QPADEV0007RJTRAFF 022600
PPCO address EECFB2AFE6 000200
indicates that job 022600/RJTRAFF/QPADEV0007 has the file open. Of
course, that may not be
the only job, and the file may be open multiple times in the same job.
Repeated searches for the
vnode address will yield all of the open instances for a file.
Now that you have all the information, here is what you do with it:
1. If you find that the file is open, simply ending the opening jobs
should free the file.
2. If the lockattrs value in the Lock flags area is "1", then SAV/RST
is involved. If there
is no current SAV/RST activity, then it is either the case that a SAV
or RST operation was
interrupted while the lock was held and the lock was not released, or
there is directory damage
which confused SAV/RST into locking and unlocking the wrong objects. An
IPL is necessary, and
if the problem reoccurs, a RCLSTG is probably needed.
3. If you find some odd circumstance, such as the usecount being
negative, or lock flags set while
the usecount is zero some extra-fine analysis is necessary in these
cases.

Jonathan Ball

unread,

Oct 2, 2006, 2:55:25 PM10/2/06

to

Sorry for the top-post.

Thanks for the information. That *is* convoluted, but it looks as if
it might yield something helpful.

walker.l2

unread,

Oct 3, 2006, 4:43:33 AM10/3/06

to

The short version is that it appears that an active save can create
object locks which are not reported by DSPOBJLCK. So you might want to
check if there are any saves running when you get your 'object in use'
problems.