Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Odd INode problem on OSR505

41 views
Skip to first unread message

Stuart J. Browne

unread,
Jun 17, 2004, 7:25:28 PM6/17/04
to
SCO:Unix::5.0.5Eb rs505a.Unix505.1.0a oss497c.Unix505.1.0a
oss600a.Unix505 OSS621A.505.SCO.Unix.RTS


Hi.

On a system I have I've got something odd occurring.

In /usr/spool/lp/admins/lp/interfaces/ , there's a reference to a file
'dj_19' with inode 66409 of which isn't right.

# ls -l dj*
/bin/ls: dj_19 not found: No such file or directory (error 2)
-rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_116
-rwxrwxr-x 1 lp lp 5980 Dec 6 1999 dj_12
-rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_13
-rwxrwxr-x 1 lp lp 7144 Jun 8 10:48 dj_7

After doing this a few times, it got linked to *something*:

# ls -ali dj*
62986 -rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_116
62977 -rwxrwxr-x 1 lp lp 5980 Dec 6 1999 dj_12
62922 -rwxrwxr-x 1 lp lp 6730 Dec 6 1999 dj_13
66409 -rw-rw---- 1 auth auth 577 Jun 18 08:26 dj_19
17656 -rwxrwxr-x 1 lp lp 7144 Jun 8 10:48 dj_7

This disappeared again soon after, going back to the error.

So I thought I'd set a trap and 'rm' it (until touch dj_19; sleep
1;done;rm -f dj_19). This removed the funky 'dj_19' entry.

However this just moved the funky-linked-file to another directory,
/tmp/lj_148.713 this time. ('find . -mount -inum <inum>' is handy).

So I thought I'd try something a little more powerful, using 'unlink
lj_148.713' this time. I've not been able to catch it though.

I figure this is the sort of thing that would be fixed by an 'fsck', but as
it's a production system, scheduling downtime isn't the easiest of things
to get done.

I was wondering if anybody had any ideas on how to remove this confused
inode?

Stuart


Bela Lubkin

unread,
Jun 18, 2004, 7:07:31 AM6/18/04
to sco...@xenitec.ca
Stuart J. Browne wrote:

> SCO:Unix::5.0.5Eb rs505a.Unix505.1.0a oss497c.Unix505.1.0a
> oss600a.Unix505 OSS621A.505.SCO.Unix.RTS
>

I would start by not messing with it any more, until you can reboot...

When you first noticed this, /usr/spool/lp/admins/lp/interfaces/dj_19
was a directory entry pointing to an inode number which was deallocated.
Attempts to access the file failed since the inode wasn't in use.
Eventually, some other file was created with that inode number. When
you deleted .../dj_19, you cleared out the bad directory entry, but you
also decremented the reference count on that inode to 0, so it became
deallocated again. That pushed the problem to wherever the new file
with that inode number had been created -- /tmp/lj_148.713, you say.

I wouldn't expect `unlink` to do any better than `rm`; it will still
decrement the reference count, pushing the problem to yet another file.

To temporarily capture the problem, try something like this:

mkdir /tmp/trap
cd /tmp/trap
i=1000
while [ $i -le 9999 ]; do
touch trap.$i
set -- `ls -i trap.$i`
[ $1 = 66409 ] && break
done

That is, deliberately create a whole pile of new files. Stop when you
get the bad inum.

Now you'll have two filenames pointing to the same file -- say,
/tmp/lj_148.713 and /tmp/trap/trap.1234. You should be able to rename
them both:

mkdir /tmp/junk
mv /tmp/lj_148.713 /tmp/junk/name1
mv /tmp/trap/trap.1234 /tmp/junk/name2

and then remove the rest of the files that weren't the bad inum:

rm -rf /tmp/trap

After that, I would just leave /tmp/junk alone until you can reboot.
From single-user mode, run `fsck -o full /dev/root`.

Beware that you could end up panicing along the way. I'm a little
surprised the kernel hasn't already taken offense.

If you want to play with a bit more fire, once you've isolated the two
names of the bad inum, I can think of two interesting experiments.
Unfortunately I don't think you can try both of them, so you have to
pick one. One is:

cd /tmp/junk
ln -f name2 name1

It's possible that in the process of trying to link together these two
names that are actually already linked together, the link count will end
up back at 2.

The other experiment is:

cd /tmp/junk
rm name1 name2
/etc/clri /dev/root 66409

Actually it's not clear what the best order of operations would be
among:

rm name1
rm name2
/etc/clri /dev/root 66409

The clri(ADM) man page actually describes the problem you're having.
But I think its advice is flawed because, as you've experienced, the
filesystem won't _let_ you remove an entry that corresponds to an
unallocated inode.

Maybe it isn't so flawed. I'm only guessing, but it sounds like the
problem is in the kernel's idea of the inode's reference count; not the
_on disk_ but the _in memory_ count. `clri` will wipe the on-disk inode
but I think it won't clear the in-memory count. So maybe you can still
remove the directory entries after the `clri`.

Still, (1) expect a panic somewhere along the way, (2) do an `fsck -o
full` as early as possible even if you think you've tricked your way
into a clean repair.

>Bela<

Barry Swane

unread,
Jun 20, 2004, 10:16:36 AM6/20/04
to

"Stuart J. Browne" <stu...@promed.com.au> wrote in message
news:40d22867$1...@dnews.tpgi.com.au...

I have something that might be similar, also on 505
In my case, it is a directory
ls -l yields on the parent directory (/u) yields:
drwxrwxrwx 2 root other 8192 Jun 8 2003 zoldphoenix
ls -l zoldphoenix yields standard output:
total 0
standard error output:
ls: zoldphoenix/^LÖ not found: No such file or directory (error 2)

ls -i zoldphoenix yields:
201446924 ^LÖ


This originally was a directory with lots of files, and subdirectories. I
managed to create a new directory, and get all files moved to the new
directory. Then I was able to change names of directories, such that I had
effectively replaced the bad directory with a new good one, so my production
problem was resolved.
But, I have never been clever enough to get rid of the bad directory.
The best I was able to do, was to move it around, from one directory, to
another, finally leaving it at the top of the /u filesystem. I was unable
to move it to another filesystem.

Barry

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.698 / Virus Database: 455 - Release Date: 09/06/2004


Jean-Pierre Radley

unread,
Jun 20, 2004, 11:02:05 AM6/20/04
to
Barry Swane typed (on Sun, Jun 20, 2004 at 02:16:36PM +0000):

|
| >
| I have something that might be similar, also on 505
| In my case, it is a directory
| ls -l yields on the parent directory (/u) yields:
| drwxrwxrwx 2 root other 8192 Jun 8 2003 zoldphoenix
| ls -l zoldphoenix yields standard output:
| total 0
| standard error output:
| ls: zoldphoenix/^LÖ not found: No such file or directory (error 2)
|
| ls -i zoldphoenix yields:
| 201446924 ^LÖ
|
| This originally was a directory with lots of files, and subdirectories. I
| managed to create a new directory, and get all files moved to the new
| directory. Then I was able to change names of directories, such that I had
| effectively replaced the bad directory with a new good one, so my production
| problem was resolved.
| But, I have never been clever enough to get rid of the bad directory.
| The best I was able to do, was to move it around, from one directory, to
| another, finally leaving it at the top of the /u filesystem. I was unable
| to move it to another filesystem.

Try:
ls -bil /u

You seem to have a non-printing character or two in the directory's name.
That command will give you its inode number, with which you can do

find /u -inum <that_inode_number> | xargs rmdir

--
JP

Barry Swane

unread,
Jun 20, 2004, 12:42:13 PM6/20/04
to

"Jean-Pierre Radley" <j...@jpr.com> wrote in message
news:2004062015...@jpradley.jpr.com...

> Barry Swane typed (on Sun, Jun 20, 2004 at 02:16:36PM +0000):
> | ls -l yields on the parent directory (/u) yields:
> | drwxrwxrwx 2 root other 8192 Jun 8 2003 zoldphoenix
> | ls -l zoldphoenix yields standard output:
> | total 0
> | standard error output:
> | ls: zoldphoenix/^LÖ not found: No such file or directory (error 2)
> |
> | ls -i zoldphoenix yields:
> | 201446924 ^LÖ
> |
> Try:
> ls -bil /u
Yields:
2142 drwxrwxrwx 2 root other 8192 Jun 8 2003 zoldphoenix

>
> You seem to have a non-printing character or two in the directory's name.
> That command will give you its inode number, with which you can do
>
> find /u -inum <that_inode_number> | xargs rmdir

Yields:
:mdir: /u/zoldphoenix: Directory not empty
~
I then tried:
find /u -inum 2142 |xargs rm -r
which yielded:
rm: ^LÖ non-existent
rm: unable to remove directory /u/zoldphoenix: File exists (error 17)

So, I guess it at least thinks there is a file in that directory, with
unprintable characters as part or all of the filename--
I suppose the issue is getting the precise filename thru to rm.


Barry


>
> --
> JP

Jean-Pierre Radley

unread,
Jun 20, 2004, 1:13:10 PM6/20/04
to
Barry Swane typed (on Sun, Jun 20, 2004 at 04:42:13PM +0000):

Keep going down the same path:

find /u -inum 2142 | xargs l -bil

--
JP

Stuart J. Browne

unread,
Jun 20, 2004, 10:05:45 PM6/20/04
to

"Bela Lubkin" <be...@sco.com> wrote in message
news:20040618110...@sco.com...

> Stuart J. Browne wrote:
>
> > I was wondering if anybody had any ideas on how to remove this confused
> > inode?
>
> I would start by not messing with it any more, until you can reboot...
>
> When you first noticed this, /usr/spool/lp/admins/lp/interfaces/dj_19
> was a directory entry pointing to an inode number which was deallocated.
> Attempts to access the file failed since the inode wasn't in use.
> Eventually, some other file was created with that inode number. When
> you deleted .../dj_19, you cleared out the bad directory entry, but you
> also decremented the reference count on that inode to 0, so it became
> deallocated again. That pushed the problem to wherever the new file
> with that inode number had been created -- /tmp/lj_148.713, you say.
>
> I wouldn't expect `unlink` to do any better than `rm`; it will still
> decrement the reference count, pushing the problem to yet another file.
>

It trapped it's self after a while:

Stuart J. Browne

unread,
Jun 20, 2004, 10:31:18 PM6/20/04
to
ack!

"Stuart J. Browne" <stu...@promed.com.au> wrote in message

news:40d6...@dnews.tpgi.com.au...


>
> "Bela Lubkin" <be...@sco.com> wrote in message
> news:20040618110...@sco.com...
> > Stuart J. Browne wrote:
> >
> > > I was wondering if anybody had any ideas on how to remove this
confused
> > > inode?
> >
> > I would start by not messing with it any more, until you can reboot...
> >
> > When you first noticed this, /usr/spool/lp/admins/lp/interfaces/dj_19
> > was a directory entry pointing to an inode number which was
deallocated.
> > Attempts to access the file failed since the inode wasn't in use.
> > Eventually, some other file was created with that inode number. When
> > you deleted .../dj_19, you cleared out the bad directory entry, but you
> > also decremented the reference count on that inode to 0, so it became
> > deallocated again. That pushed the problem to wherever the new file
> > with that inode number had been created -- /tmp/lj_148.713, you say.
> >
> > I wouldn't expect `unlink` to do any better than `rm`; it will still
> > decrement the reference count, pushing the problem to yet another file.
> >
>
> It trapped it's self after a while:

334# ls -ali
total 138
25262 drwxr-xr-x 2 root sys 512 Jun 21 11:30 .
13994 drwxrwxrwt 18 sys sys 64000 Jun 21 11:30 ..
66409 -rw-rw---- 1 medicus medicus 1210 Jun 21 11:16 name1
66409 -rw-rw---- 1 medicus medicus 1210 Jun 21 11:16 name2
335# pwd
/tmp/junk
336#

> >
> > After that, I would just leave /tmp/junk alone until you can reboot.
> > From single-user mode, run `fsck -o full /dev/root`.

mm.. play it safe .... ... But I so like fire ... <insert evil grin here>

> > Beware that you could end up panicing along the way. I'm a little
> > surprised the kernel hasn't already taken offense.
> >
> > If you want to play with a bit more fire, once you've isolated the two
> > names of the bad inum, I can think of two interesting experiments.
> > Unfortunately I don't think you can try both of them, so you have to
> > pick one. One is:
> >
> > cd /tmp/junk
> > ln -f name2 name1

330# ln -f name1 name2
ln: cannot access source file name1: No such file or directory (error 2)

> > It's possible that in the process of trying to link together these two
> > names that are actually already linked together, the link count will
end
> > up back at 2.


which get us back into our original situation:

363# ls -al
/bin/ls: ./name1 not found: No such file or directory (error 2)
total 130
drwxr-xr-x 2 root sys 512 Jun 21 11:44 .
drwxrwxrwt 18 sys sys 64000 Jun 21 11:44 ..


So, can try experiment No. 2! ;)

> > The other experiment is:
> >
> > cd /tmp/junk
> > rm name1 name2
> > /etc/clri /dev/root 66409
> >
> > Actually it's not clear what the best order of operations would be
> > among:
> >
> > rm name1
> > rm name2
> > /etc/clri /dev/root 66409
> >
> > The clri(ADM) man page actually describes the problem you're having.
> > But I think its advice is flawed because, as you've experienced, the
> > filesystem won't _let_ you remove an entry that corresponds to an
> > unallocated inode.

Yeap, didn't do anything.

> > Maybe it isn't so flawed. I'm only guessing, but it sounds like the
> > problem is in the kernel's idea of the inode's reference count; not the
> > _on disk_ but the _in memory_ count. `clri` will wipe the on-disk
inode
> > but I think it won't clear the in-memory count. So maybe you can still
> > remove the directory entries after the `clri`.
> >
> > Still, (1) expect a panic somewhere along the way, (2) do an `fsck -o
> > full` as early as possible even if you think you've tricked your way
> > into a clean repair.


Ah well, no problem. Have the i-node trapped now, so will schedule
downtime to do an FSCK.

Thanks again Bela.


Barry Swane

unread,
Jun 21, 2004, 12:16:25 PM6/21/04
to

"Jean-Pierre Radley" <j...@jpr.com> wrote in message
news:2004062017...@jpradley.jpr.com...

> | I then tried:
> | find /u -inum 2142 |xargs rm -r
> | which yielded:
> | rm: ^LÖ non-existent
> | rm: unable to remove directory /u/zoldphoenix: File exists (error 17)
> |

> | So, there is a file in that directory, with


> | unprintable characters as part or all of the filename--

> | The issue is getting the precise filename thru to rm.


> |
>
> Keep going down the same path:
>
> find /u -inum 2142 | xargs l -bil
>
> --
> JP

I was able to capture the output of the offending file name
od -b junk
0000000 001 014 326 006 034 002 012

I haven't been clever enough to get that file name removed-- every trick I
try yields File not found.
Tried wildcard, tried setting variable to that value,
rm -i $file

Are there any other tricks I could try to remove the file?
(This is just academic, at this point, we can just leave it there forever--
but maybe I can learn something further about how rm parses special
characters in it's arguments. Any references other than "man rm" I can
follow up on?)
Thanks for help already rendered, JP!

Barry

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).

Version: 6.0.698 / Virus Database: 455 - Release Date: 10/06/2004


Ronald J Marchand

unread,
Jun 21, 2004, 1:06:39 PM6/21/04
to
"Barry Swane" <bsw...@rogers.com> wrote in message
news:tRDBc.1095$u1A...@news04.bloor.is.net.cable.rogers.com...

Have you enclosed the filename in quotations?
as in: rm 'file name'

Ron

Jean-Pierre Radley

unread,
Jun 21, 2004, 6:38:31 PM6/21/04
to
Barry Swane typed (on Mon, Jun 21, 2004 at 04:16:25PM +0000):

|
| "Jean-Pierre Radley" <j...@jpr.com> wrote in message
| news:2004062017...@jpradley.jpr.com...
|
| > | I then tried:
| > | find /u -inum 2142 |xargs rm -r
| > | which yielded:
| > | rm: ^LÖ non-existent
| > | rm: unable to remove directory /u/zoldphoenix: File exists (error 17)
| > |
| > | So, there is a file in that directory, with
| > | unprintable characters as part or all of the filename--
| > | The issue is getting the precise filename thru to rm.
| > |
| >
| > Keep going down the same path:
| >
| > find /u -inum 2142 | xargs l -bil
|
| I was able to capture the output of the offending file name
| od -b junk
| 0000000 001 014 326 006 034 002 012
| I haven't been clever enough to get that file name removed-- every trick I
| try yields File not found.
| Tried wildcard, tried setting variable to that value,
| rm -i $file
|
| Are there any other tricks I could try to remove the file?
| (This is just academic, at this point, we can just leave it there forever--
| but maybe I can learn something further about how rm parses special
| characters in it's arguments. Any references other than "man rm" I can
| follow up on?)
| Thanks for help already rendered, JP!

Ugh. I prefer hd.

IAC, the name of the file is junk .

Have you tried 'rm junk?' ?

--
JP

0 new messages