Hi.
On a system I have I've got something odd occurring.
In /usr/spool/lp/admins/lp/interfaces/ , there's a reference to a file
'dj_19' with inode 66409 of which isn't right.
# ls -l dj*
/bin/ls: dj_19 not found: No such file or directory (error 2)
-rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_116
-rwxrwxr-x 1 lp lp 5980 Dec 6 1999 dj_12
-rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_13
-rwxrwxr-x 1 lp lp 7144 Jun 8 10:48 dj_7
After doing this a few times, it got linked to *something*:
# ls -ali dj*
62986 -rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_116
62977 -rwxrwxr-x 1 lp lp 5980 Dec 6 1999 dj_12
62922 -rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_13
66409 -rw-rw---- 1 auth auth 577 Jun 18 08:26 dj_19
17656 -rwxrwxr-x 1 lp lp 7144 Jun 8 10:48 dj_7
This disappeared again soon after, going back to the error.
So I thought I'd set a trap and 'rm' it (until touch dj_19; sleep
1;done;rm -f dj_19). This removed the funky 'dj_19' entry.
However this just moved the funky-linked-file to another directory,
/tmp/lj_148.713 this time. ('find . -mount -inum <inum>' is handy).
So I thought I'd try something a little more powerful, using 'unlink
lj_148.713' this time. I've not been able to catch it though.
I figure this is the sort of thing that would be fixed by an 'fsck', but as
it's a production system, scheduling downtime isn't the easiest of things
to get done.
I was wondering if anybody had any ideas on how to remove this confused
inode?
Stuart
> SCO:Unix::5.0.5Eb rs505a.Unix505.1.0a oss497c.Unix505.1.0a
> oss600a.Unix505 OSS621A.505.SCO.Unix.RTS
>
I would start by not messing with it any more, until you can reboot...
When you first noticed this, /usr/spool/lp/admins/lp/interfaces/dj_19
was a directory entry pointing to an inode number which was deallocated.
Attempts to access the file failed since the inode wasn't in use.
Eventually, some other file was created with that inode number. When
you deleted .../dj_19, you cleared out the bad directory entry, but you
also decremented the reference count on that inode to 0, so it became
deallocated again. That pushed the problem to wherever the new file
with that inode number had been created -- /tmp/lj_148.713, you say.
I wouldn't expect `unlink` to do any better than `rm`; it will still
decrement the reference count, pushing the problem to yet another file.
To temporarily capture the problem, try something like this:
mkdir /tmp/trap
cd /tmp/trap
i=1000
while [ $i -le 9999 ]; do
touch trap.$i
set -- `ls -i trap.$i`
[ $1 = 66409 ] && break
done
That is, deliberately create a whole pile of new files. Stop when you
get the bad inum.
Now you'll have two filenames pointing to the same file -- say,
/tmp/lj_148.713 and /tmp/trap/trap.1234. You should be able to rename
them both:
mkdir /tmp/junk
mv /tmp/lj_148.713 /tmp/junk/name1
mv /tmp/trap/trap.1234 /tmp/junk/name2
and then remove the rest of the files that weren't the bad inum:
rm -rf /tmp/trap
After that, I would just leave /tmp/junk alone until you can reboot.
From single-user mode, run `fsck -o full /dev/root`.
Beware that you could end up panicing along the way. I'm a little
surprised the kernel hasn't already taken offense.
If you want to play with a bit more fire, once you've isolated the two
names of the bad inum, I can think of two interesting experiments.
Unfortunately I don't think you can try both of them, so you have to
pick one. One is:
cd /tmp/junk
ln -f name2 name1
It's possible that in the process of trying to link together these two
names that are actually already linked together, the link count will end
up back at 2.
The other experiment is:
cd /tmp/junk
rm name1 name2
/etc/clri /dev/root 66409
Actually it's not clear what the best order of operations would be
among:
rm name1
rm name2
/etc/clri /dev/root 66409
The clri(ADM) man page actually describes the problem you're having.
But I think its advice is flawed because, as you've experienced, the
filesystem won't _let_ you remove an entry that corresponds to an
unallocated inode.
Maybe it isn't so flawed. I'm only guessing, but it sounds like the
problem is in the kernel's idea of the inode's reference count; not the
_on disk_ but the _in memory_ count. `clri` will wipe the on-disk inode
but I think it won't clear the in-memory count. So maybe you can still
remove the directory entries after the `clri`.
Still, (1) expect a panic somewhere along the way, (2) do an `fsck -o
full` as early as possible even if you think you've tricked your way
into a clean repair.
>Bela<
"Stuart J. Browne" <stu...@promed.com.au> wrote in message
news:40d22867$1...@dnews.tpgi.com.au...
I have something that might be similar, also on 505
In my case, it is a directory
ls -l yields on the parent directory (/u) yields:
drwxrwxrwx 2 root other 8192 Jun 8 2003 zoldphoenix
ls -l zoldphoenix yields standard output:
total 0
standard error output:
ls: zoldphoenix/^LÖ not found: No such file or directory (error 2)
ls -i zoldphoenix yields:
201446924 ^LÖ
This originally was a directory with lots of files, and subdirectories. I
managed to create a new directory, and get all files moved to the new
directory. Then I was able to change names of directories, such that I had
effectively replaced the bad directory with a new good one, so my production
problem was resolved.
But, I have never been clever enough to get rid of the bad directory.
The best I was able to do, was to move it around, from one directory, to
another, finally leaving it at the top of the /u filesystem. I was unable
to move it to another filesystem.
Barry
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.698 / Virus Database: 455 - Release Date: 09/06/2004
Try:
ls -bil /u
You seem to have a non-printing character or two in the directory's name.
That command will give you its inode number, with which you can do
find /u -inum <that_inode_number> | xargs rmdir
--
JP
>
> You seem to have a non-printing character or two in the directory's name.
> That command will give you its inode number, with which you can do
>
> find /u -inum <that_inode_number> | xargs rmdir
Yields:
:mdir: /u/zoldphoenix: Directory not empty
~
I then tried:
find /u -inum 2142 |xargs rm -r
which yielded:
rm: ^LÖ non-existent
rm: unable to remove directory /u/zoldphoenix: File exists (error 17)
So, I guess it at least thinks there is a file in that directory, with
unprintable characters as part or all of the filename--
I suppose the issue is getting the precise filename thru to rm.
Barry
>
> --
> JP
Keep going down the same path:
find /u -inum 2142 | xargs l -bil
--
JP
It trapped it's self after a while:
"Stuart J. Browne" <stu...@promed.com.au> wrote in message
news:40d6...@dnews.tpgi.com.au...
>
> "Bela Lubkin" <be...@sco.com> wrote in message
> news:20040618110...@sco.com...
> > Stuart J. Browne wrote:
> >
> > > I was wondering if anybody had any ideas on how to remove this
confused
> > > inode?
> >
> > I would start by not messing with it any more, until you can reboot...
> >
> > When you first noticed this, /usr/spool/lp/admins/lp/interfaces/dj_19
> > was a directory entry pointing to an inode number which was
deallocated.
> > Attempts to access the file failed since the inode wasn't in use.
> > Eventually, some other file was created with that inode number. When
> > you deleted .../dj_19, you cleared out the bad directory entry, but you
> > also decremented the reference count on that inode to 0, so it became
> > deallocated again. That pushed the problem to wherever the new file
> > with that inode number had been created -- /tmp/lj_148.713, you say.
> >
> > I wouldn't expect `unlink` to do any better than `rm`; it will still
> > decrement the reference count, pushing the problem to yet another file.
> >
>
> It trapped it's self after a while:
334# ls -ali
total 138
25262 drwxr-xr-x 2 root sys 512 Jun 21 11:30 .
13994 drwxrwxrwt 18 sys sys 64000 Jun 21 11:30 ..
66409 -rw-rw---- 1 medicus medicus 1210 Jun 21 11:16 name1
66409 -rw-rw---- 1 medicus medicus 1210 Jun 21 11:16 name2
335# pwd
/tmp/junk
336#
> >
> > After that, I would just leave /tmp/junk alone until you can reboot.
> > From single-user mode, run `fsck -o full /dev/root`.
mm.. play it safe .... ... But I so like fire ... <insert evil grin here>
> > Beware that you could end up panicing along the way. I'm a little
> > surprised the kernel hasn't already taken offense.
> >
> > If you want to play with a bit more fire, once you've isolated the two
> > names of the bad inum, I can think of two interesting experiments.
> > Unfortunately I don't think you can try both of them, so you have to
> > pick one. One is:
> >
> > cd /tmp/junk
> > ln -f name2 name1
330# ln -f name1 name2
ln: cannot access source file name1: No such file or directory (error 2)
> > It's possible that in the process of trying to link together these two
> > names that are actually already linked together, the link count will
end
> > up back at 2.
which get us back into our original situation:
363# ls -al
/bin/ls: ./name1 not found: No such file or directory (error 2)
total 130
drwxr-xr-x 2 root sys 512 Jun 21 11:44 .
drwxrwxrwt 18 sys sys 64000 Jun 21 11:44 ..
So, can try experiment No. 2! ;)
> > The other experiment is:
> >
> > cd /tmp/junk
> > rm name1 name2
> > /etc/clri /dev/root 66409
> >
> > Actually it's not clear what the best order of operations would be
> > among:
> >
> > rm name1
> > rm name2
> > /etc/clri /dev/root 66409
> >
> > The clri(ADM) man page actually describes the problem you're having.
> > But I think its advice is flawed because, as you've experienced, the
> > filesystem won't _let_ you remove an entry that corresponds to an
> > unallocated inode.
Yeap, didn't do anything.
> > Maybe it isn't so flawed. I'm only guessing, but it sounds like the
> > problem is in the kernel's idea of the inode's reference count; not the
> > _on disk_ but the _in memory_ count. `clri` will wipe the on-disk
inode
> > but I think it won't clear the in-memory count. So maybe you can still
> > remove the directory entries after the `clri`.
> >
> > Still, (1) expect a panic somewhere along the way, (2) do an `fsck -o
> > full` as early as possible even if you think you've tricked your way
> > into a clean repair.
Ah well, no problem. Have the i-node trapped now, so will schedule
downtime to do an FSCK.
Thanks again Bela.
> | I then tried:
> | find /u -inum 2142 |xargs rm -r
> | which yielded:
> | rm: ^LÖ non-existent
> | rm: unable to remove directory /u/zoldphoenix: File exists (error 17)
> |
> | So, there is a file in that directory, with
> | unprintable characters as part or all of the filename--
> | The issue is getting the precise filename thru to rm.
> |
>
> Keep going down the same path:
>
> find /u -inum 2142 | xargs l -bil
>
> --
> JP
I was able to capture the output of the offending file name
od -b junk
0000000 001 014 326 006 034 002 012
I haven't been clever enough to get that file name removed-- every trick I
try yields File not found.
Tried wildcard, tried setting variable to that value,
rm -i $file
Are there any other tricks I could try to remove the file?
(This is just academic, at this point, we can just leave it there forever--
but maybe I can learn something further about how rm parses special
characters in it's arguments. Any references other than "man rm" I can
follow up on?)
Thanks for help already rendered, JP!
Barry
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.698 / Virus Database: 455 - Release Date: 10/06/2004
Have you enclosed the filename in quotations?
as in: rm 'file name'
Ron
Ugh. I prefer hd.
IAC, the name of the file is junk .
Have you tried 'rm junk?' ?
--
JP