Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Linux-2.5.14..

31 views
Skip to first unread message

Linus Torvalds

unread,
May 6, 2002, 12:00:07 AM5/6/02
to

There's a lot of stuff that has happened in the 2.5.x series lately, and
you can see the gory details in the ChangeLog files that accompany
releases these days, but I thought I'd point out 2.5.14, since it has some
interesting fundamental changes to how dirty state is maintained in the
VM.

(The big changes were actually in 2.5.12, but 2.5.13 contained various
minor fixes and tweaks, and 2.5.14 contains a number of fixes especially
wrt truncate, so hopefully it's fairly _stable_ as of 2.5.14.)

Credit goes to Andrew Morton, and not only does it clean up the code a
lot, it also seems to perform a lot better in many circumstances.

There's a lot of other stuff in the 2.5.x tree too, but few things are so
fundamental. Please test (but also, please be careful - backups are always
a good idea).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Daniel Pittman

unread,
May 6, 2002, 2:40:04 AM5/6/02
to
On Sun, 5 May 2002, Linus Torvalds wrote:
> There's a lot of stuff that has happened in the 2.5.x series lately,
> and you can see the gory details in the ChangeLog files that accompany
> releases these days, but I thought I'd point out 2.5.14, since it has
> some interesting fundamental changes to how dirty state is maintained
> in the VM.
>
> (The big changes were actually in 2.5.12, but 2.5.13 contained various
> minor fixes and tweaks, and 2.5.14 contains a number of fixes
> especially wrt truncate, so hopefully it's fairly _stable_ as of
> 2.5.14.)

From the look of the changelog at least a few of the file corruption
bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
expect this release to address the problems I was seeing?

Daniel

--
I keep my head above the surface, trying to breath, looking for land.
I keep an eye at the distant horizon waiting for help, clutching the sky.
-- Covenant, _Phoenix_

bert hubert

unread,
May 6, 2002, 2:50:07 AM5/6/02
to
On Mon, May 06, 2002 at 03:54:46AM +0000, Linus Torvalds wrote:

> releases these days, but I thought I'd point out 2.5.14, since it has some
> interesting fundamental changes to how dirty state is maintained in the
> VM.

I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
where does the current VM lie in between rmap-vm and aa-vm?

Regards,

bert hubert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

Andrew Morton

unread,
May 6, 2002, 3:00:05 AM5/6/02
to
Daniel Pittman wrote:
>
> On Sun, 5 May 2002, Linus Torvalds wrote:
> > There's a lot of stuff that has happened in the 2.5.x series lately,
> > and you can see the gory details in the ChangeLog files that accompany
> > releases these days, but I thought I'd point out 2.5.14, since it has
> > some interesting fundamental changes to how dirty state is maintained
> > in the VM.
> >
> > (The big changes were actually in 2.5.12, but 2.5.13 contained various
> > minor fixes and tweaks, and 2.5.14 contains a number of fixes
> > especially wrt truncate, so hopefully it's fairly _stable_ as of
> > 2.5.14.)
>
> >From the look of the changelog at least a few of the file corruption
> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
> expect this release to address the problems I was seeing?
>

I don't have an explanation for the ext3 problem which you saw.
It's conceivable that 2.5.13 was leaving dirty buffers around after
they were "deleted", and that fsync grabbed them via the i_dirty_buffers
back door, and wrote them where they shouldn't have been written.

But they wouldn't have been mapped anywhere...

So I still need to try to reproduce that one. If you could have
another shot, it would be appreciated. But if it _does_ work OK,
I can't say it's fixed until I know what caused the 2.4.13 failure.

ext3 is very sensitive to what is going on in buffer.c. There's
a lot of tension in there between the desire to share code and
the desire to not be damaged by changes in the code which we share.

Generally, ext3 with data=journal is not happy at present.

Partly because it contains assertions of things which aren't true
any more.

Partly because of a known problem in ext3: assertion failure at
transaction.c:606. Stephen has a fix for this which we need to
wiggle into 2.4. For some reason, the 2.5 changes are triggering
it much more easily.

I'll spend a few hours this week trying to resurrect data=journal,
but if that doesn't work out I think I'll just turn it off for the
while, make it emit a warning and use data=ordered.

-

Andrew Morton

unread,
May 6, 2002, 3:10:06 AM5/6/02
to
bert hubert wrote:
>
> On Mon, May 06, 2002 at 03:54:46AM +0000, Linus Torvalds wrote:
>
> > releases these days, but I thought I'd point out 2.5.14, since it has some
> > interesting fundamental changes to how dirty state is maintained in the
> > VM.
>
> I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
> where does the current VM lie in between rmap-vm and aa-vm?
>

"VM" is a broad term. The memory allocator, page replacement, swap and
all that stuff is unaltered - it is the same as 2.4.current. ie: Andrea's
VM from when his changes stopped going into the mainline kernel.

I made minimal changes in there to teach the page allocator that
all dirty memory is written back via pages and not sometimes-pages,
sometimes-buffers. Also to add support for the new `clustering
writeback' which address_spaces can perform.

It's probably not as well tuned as it could be at present, but
I don't see a lot of point in fiddling with it. As long as the
VM doesn't actually impede 2.5 development and evaulation of
2.5 performance, best to leave it alone until a VM developer
steps up to do the 2.6 VM.

The change to which Linus refers is:

In 2.4, dirty data from the write(2) system call is encapsulated
in buffer_heads and is placed on a global buffer list for writeout.
And dirty data from shared mappings is attached to its inode.

In 2.5, the buffer list went away, and dirty data from write(2)
is now managed in the same way as dirty data from mmap().

And because the kupdate and bdflush functions used to work
against the buffer LRU, replacements were introduced which do
the same thing against the inodes, instead of against the buffers.

So it's all page-oriented now.

-

Rik van Riel

unread,
May 6, 2002, 10:10:04 AM5/6/02
to
On Mon, 6 May 2002, Andrew Morton wrote:
> bert hubert wrote:

> > I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
> > where does the current VM lie in between rmap-vm and aa-vm?

> I made minimal changes in there to teach the page allocator that


> all dirty memory is written back via pages and not sometimes-pages,
> sometimes-buffers. Also to add support for the new `clustering
> writeback' which address_spaces can perform.

> So it's all page-oriented now.

Nice, this will make it possible to have much cleaner page
replacement code.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Linus Torvalds

unread,
May 6, 2002, 11:20:07 AM5/6/02
to

On Mon, 6 May 2002, Daniel Pittman wrote:
>
> From the look of the changelog at least a few of the file corruption
> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
> expect this release to address the problems I was seeing?

"Expect" is too strong a word. I'd say "hope" - a number of truncate bugs
were fixed, but whether that was what bit you, nobody knows.

I suspect the real answer is that we'd love for you to test things out,
but that if it ends up being too painful to recover if the problems happen
again, you probably shouldn't..

Linus

Daniel Pittman

unread,
May 7, 2002, 12:30:06 AM5/7/02
to
On Mon, 6 May 2002, Linus Torvalds wrote:
> On Mon, 6 May 2002, Daniel Pittman wrote:
>>
>> From the look of the changelog at least a few of the file corruption
>> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should
>> I expect this release to address the problems I was seeing?
>
> "Expect" is too strong a word. I'd say "hope" - a number of truncate
> bugs were fixed, but whether that was what bit you, nobody knows.

Well, hope seems justified...

> I suspect the real answer is that we'd love for you to test things
> out, but that if it ends up being too painful to recover if the
> problems happen again, you probably shouldn't..

I did, and I failed to reproduce it working on a scratch disk. This was
a period of playing and I /hope/ that it's conclusive. I couldn't get
.12 to reliably fail, though, which is less inspiring.

I should be able to find some time in the next day or so to test it a
bit more on the scratch disk and then, if that works, I will update my
backups. :)

Still, it seems good so far.
Daniel

--
Television is the ideal propaganda medium, a mendacious monster, not primarily
out of malice but from its amoral nature.
-- Paul Johnson

Martin Dalecki

unread,
May 7, 2002, 8:30:11 AM5/7/02
to
Mon May 6 13:29:44 CEST 2002 ide-clean-56

- Push poll_timeout down from the hwgroup to the channel. We are resetting the
channel and not a whole hwgroup. This way using multiple pdc202xxx cards
should magically start to work with multiple performance and resets will no
longer lock the system.

- Updates for PDC4030 by Peter Denison <pet...@marshadder.uklinux.net>.

- Make ide_raw_taskfile don't care about request buffers. They where always
NULL.

- Port set multi mode count over from the special setting interface to
ide_raw_taskfile. Fix errors in the corresponding interrupt handler in one go
as well. It turned out that this is precisely the same code as in
task_no_data_intr, so we can nuke it altogether. And finally we have found
some problems with the set_pio_mode() command which can fail with -EBUSY -
this is in esp. probably *very* common during boot hdparm usage those days!
(OK it was masked by reportig too early that it finished... Crap Crap utter
crap it was!!!) Right now hdparm should just be extendid to properly
sync and retry on -EBUSY and everything should be fine.

And now the 1 Milion EUR question for everybody who loves to put driver
settings in to /proc:

How the hell could echo > /proc/ide/ide0/settings blah blah blah blah handle
properly cases like -EIO, -EBUSY and so on??? Having the possibility o do it
does not mean that it is a good idea to use it.

OK. After realizing the simple fact that quite a lot of low level hardware
manipulating ioctls may require assistance in usage from proper logic which is
*very* unlikely to be implemented in a bash (for me preferable still ksh) I
have made my mind up.

/proc/ide will be nuked.

- Execute the recalibration for error recovery on precisely the same request as
the one which failed.

- Remove set geometry. It's crap by means of standard specification. Because:

1. We relay on the existence of the identify command anyway.

2. This command was obsoleted *before* the identify command existed as far
as I can see.

2. I'm able to have a look at what other ATA/ATAPI drivers in the wild do:
They don't do it.

- Just call tuneproc in set_pio_mode() directly - we are already behind the rq
queue there.

- After we have uncovered the broken logics behind the whole ioctl handling we
now just have made ide_spin_wait_hwgroup() waiting for a proper somehow
longer timeout before giving up. This was previously just hiding the broken
concept of setting ioctl values through /proc/ide/ideX/settings - now it just
really helps hdparm to not to give up too early. (It shouldn't probably play
wreck havock on the global driver spin lock as well. I will look in to this
later.)

- Scrap the non necessary, to say the least, disabling of interrupts for 3,
read it again please, 3 seconds, on the local CPU inside
ide_spin_wait_hwgroup(). Spin lock handling needs checking there badly as I
see now as well...

Hey apparently any "special" requests are gone. We now have only
to deal with REQ_DEVICE_ACB and REQ_DEVICE_CMD. One of them is still too
much and will be killed.

ide-clean-56.diff

Martin Dalecki

unread,
May 7, 2002, 8:40:08 AM5/7/02
to
Tue May 7 02:37:49 CEST 2002 ide-clean-57

Nuke /proc/ide. For explanations why, please see the frustrated comments in the
previous change log. If one still don't see why it wasn't a good thing,
well please just take a look at the following:

Kernel size before:

/usr/src/linux# size vmlinux
text data bss dec hex filename
1716049 403968 470252 2590269 27863d vmlinux
/usr/src/linux#

Kernel size after:

/usr/src/linux# size vmlinux
text data bss dec hex filename
1680993 403488 470124 2554605 26faed vmlinux
/usr/src/linux#

2% of overall size! And this is not exactly an minimalistic setup.
Wow! What a waste of space!!!! Not even counting the runtime size
of this crap! And then let's take a look at the following self
flattery:

-/*
- * Copyright (C) 1997-1998 Mark Lord
- *
- * This is the /proc/ide/ filesystem implementation.
- *
- * The major reason this exists is to provide sufficient access
- * to driver and config data, such that user-mode programs can
- * be developed to handle chipset tuning for most PCI interfaces.
- * This should provide better utilities, and less kernel bloat.
^^^^^^^^^^^^^^^^^^
Well there could only be an answer to this which would be
universally understandable in every Slavic language... but since
it's mothers day...

EOD.

ide-clean-57.diff

Anton Altaparmakov

unread,
May 7, 2002, 9:20:06 AM5/7/02
to
At 12:27 07/05/02, Martin Dalecki wrote:
>Tue May 7 02:37:49 CEST 2002 ide-clean-57
>
>Nuke /proc/ide. For explanations why, please see the frustrated comments
>in the previous change log.

This is a big mistake IMO.

Nuking the ability to change settings, fair enough, but only if alternative
interface is provided for userspace to tweak everything, otherwise provide
the interface before you remove the existing one. (There may be already
another interface, I don't know...I am sure someone will tell me if there is!)

Removing the information provided by /proc/ide is very bad! It is very
useful to diagnose one's ide setup, to see what the host is configured as,
what all settings are set to, etc. This is the first place I look to check
whether the interfaces are configured as I expect them to be and in case of
problems, this is again the first place I look.

What alternatives are you going to present to give all the information that
/proc/ide gives? If the answer is none IMHO your patch is not acceptable...

Best regards,

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Martin Dalecki

unread,
May 7, 2002, 9:40:07 AM5/7/02
to
Uz.ytkownik Anton Altaparmakov napisa?:

> At 12:27 07/05/02, Martin Dalecki wrote:
>
>> Tue May 7 02:37:49 CEST 2002 ide-clean-57
>>
>> Nuke /proc/ide. For explanations why, please see the frustrated
>> comments in the previous change log.
>
>
> This is a big mistake IMO.
>
> Nuking the ability to change settings, fair enough, but only if
> alternative interface is provided for userspace to tweak everything,
> otherwise provide the interface before you remove the existing one.
> (There may be already another interface, I don't know...I am sure
> someone will tell me if there is!)

Ehmm... There *is* one interface there. hdparm will help
you. Note: the upcomming release of hdparm should contain the
following patch which incearses it's usability vastly to the
average user. Just for convenience I'm attaching it here.

If you don't like hdparm - well please shoot the
people who wrote init, ifconfig, eject and so on...

hdparm-4.9.diff

Mikael Pettersson

unread,
May 7, 2002, 10:00:16 AM5/7/02
to
Martin Dalecki writes:
> Uz.ytkownik Anton Altaparmakov napisa?:
> > At 12:27 07/05/02, Martin Dalecki wrote:
> >
> >> Tue May 7 02:37:49 CEST 2002 ide-clean-57
> >>
> >> Nuke /proc/ide. For explanations why, please see the frustrated
> >> comments in the previous change log.
> >
> >
> > This is a big mistake IMO.
> >
> > Nuking the ability to change settings, fair enough, but only if
> > alternative interface is provided for userspace to tweak everything,
> > otherwise provide the interface before you remove the existing one.
> > (There may be already another interface, I don't know...I am sure
> > someone will tell me if there is!)
>
> Ehmm... There *is* one interface there. hdparm will help
> you. Note: the upcomming release of hdparm should contain the

hdparm -i requires root privs. cat /proc/ide/${file} does not.
hdparm is NOT an acceptable substitute for /proc/ide/.

/Mikael

Anton Altaparmakov

unread,
May 7, 2002, 10:00:16 AM5/7/02
to
At 13:34 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Anton Altaparmakov napisa?:
>>At 12:27 07/05/02, Martin Dalecki wrote:
>>>Tue May 7 02:37:49 CEST 2002 ide-clean-57
>>>
>>>Nuke /proc/ide. For explanations why, please see the frustrated comments
>>>in the previous change log.
>>
>>This is a big mistake IMO.
>>Nuking the ability to change settings, fair enough, but only if
>>alternative interface is provided for userspace to tweak everything,
>>otherwise provide the interface before you remove the existing one.
>>(There may be already another interface, I don't know...I am sure someone
>>will tell me if there is!)
>
>Ehmm... There *is* one interface there. hdparm will help
>you. Note: the upcomming release of hdparm should contain the
>following patch which incearses it's usability vastly to the
>average user. Just for convenience I'm attaching it here.

How do I get this information with hdparm please?

[aia21@drop ide]$ cat via
----------VIA BusMastering IDE Configuration----------------
Driver Version: 3.34
South Bridge: VIA vt82c686b
Revision: ISA 0x40 IDE 0x6
Highest DMA rate: UDMA100
BM-DMA base: 0xd000
PCI clock: 33.3MHz
Master Read Cycle IRDY: 0ws
Master Write Cycle IRDY: 0ws
BM IDE Status Register Read Retry: yes
Max DRDY Pulse Width: No limit
-----------------------Primary IDE-------Secondary IDE------
Read DMA FIFO flush: yes yes
End Sector FIFO flush: no no
Prefetch Buffer: yes no
Post Write Buffer: yes no
Enabled: yes yes
Simplex only: no no
Cable Type: 80w 40w
-------------------drive0----drive1----drive2----drive3-----
Transfer Mode: UDMA PIO DMA UDMA
Address Setup: 30ns 120ns 30ns 30ns
Cmd Active: 90ns 90ns 90ns 90ns
Cmd Recovery: 30ns 30ns 30ns 30ns
Data Active: 90ns 330ns 90ns 90ns
Data Recovery: 30ns 270ns 30ns 30ns
Cycle Time: 20ns 600ns 120ns 60ns
Transfer Rate: 99.9MB/s 3.3MB/s 16.6MB/s 33.3MB/s

hdparm is a tool to query a device and how the controller is programmed to
talk to the device. But it is not designed nor capable of giving
information about the host itself. I just read the man page for hdparm and
there are no options in sight to show any of the things I have shown above.

Also the below work as normal user but hdparm requires super user... It is
debateable whether a normal user should be allowed access but still you are
taking away existing functionality...

[aia21@drop hda]$ cat cache
1916
[aia21@drop hda]$ cat capacity
80418240
[aia21@drop hda]$ cat geometry
physical 79780/16/63
logical 5005/255/63

And hdparm never gives you the physical geometry AFAICS.

Either I am missing something or you are removing a lot of functionality
and replacing it with nothingness...

And as I said, I can understand removing the ability to write values into
/proc/ide/*, what I disagree with is the removal of the information
provided by read-only access to /proc/ide/*. And that is because I am not
aware of any other way to get the same information.

Padraig Brady

unread,
May 7, 2002, 10:10:07 AM5/7/02
to
Martin Dalecki wrote:
> Mon May 6 13:29:44 CEST 2002 ide-clean-56
>

[snip]

> OK. After realizing the simple fact that quite a lot of low level
> hardware manipulating ioctls may require assistance in usage from
> proper logic which is *very* unlikely to be implemented in a bash
> (for me preferable still ksh) I have made my mind up.
>
> /proc/ide will be nuked.

Please consider this carefully, especially the read only bits.
One particular thing I use a lot is: /proc/ide/hda/capacity
Will there be another interface easily usable by scripts
to get this information?

Am I going to have to parse hdparm output?
....
geometry = 2434/255/63, sectors = 39102336, start = 0

Am I going to need hdparm on my embedded system?

Padraig.

Dave Jones

unread,
May 7, 2002, 10:10:09 AM5/7/02
to
On Tue, May 07, 2002 at 03:56:43PM +0200, Mikael Pettersson wrote:
> hdparm -i requires root privs.

hdparm itself doesn't, but you must be able to read /dev/hd*
Some distros have this owned by group 'disk' for eg.
Adding yourself as a member of this group should allow you to
use hdparm without becoming root.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

Dave Jones

unread,
May 7, 2002, 10:10:10 AM5/7/02
to
On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> How do I get this information with hdparm please?
>
> [aia21@drop ide]$ cat via

Bartlomiej Zolnierkiewicz moved all this stuff to userspace
a long time ago in 'ideinfo'.

> [aia21@drop hda]$ cat cache
> 1916
> [aia21@drop hda]$ cat capacity
> 80418240
> [aia21@drop hda]$ cat geometry
> physical 79780/16/63
> logical 5005/255/63
>
> And hdparm never gives you the physical geometry AFAICS.

Why would a normal user ever need to know this info?

> And as I said, I can understand removing the ability to write values into
> /proc/ide/*, what I disagree with is the removal of the information
> provided by read-only access to /proc/ide/*. And that is because I am not
> aware of any other way to get the same information.

The parsing gunk we have for /proc/ide is fugly, and should have been
done with sysctls from day one imo.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

Martin Dalecki

unread,
May 7, 2002, 10:20:07 AM5/7/02
to
Uz.ytkownik Dave Jones napisa?:

> On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
> Bartlomiej Zolnierkiewicz moved all this stuff to userspace
> a long time ago in 'ideinfo'.
>
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
> Why would a normal user ever need to know this info?
>
> > And as I said, I can understand removing the ability to write values into
> > /proc/ide/*, what I disagree with is the removal of the information
> > provided by read-only access to /proc/ide/*. And that is because I am not
> > aware of any other way to get the same information.
>
> The parsing gunk we have for /proc/ide is fugly, and should have been
> done with sysctls from day one imo.

Amen. For where it turn outs to be really really worth it
I indeed plan to move to sysctl. For example currently
we have on ioctl level still the problem that many of
them are attached to the device but act on the channel.

hdparm -xxx /dev/hda & hdparm -xxx /dev/hdc - BANG race condition.
(At least on the level of logics).

Martin Dalecki

unread,
May 7, 2002, 10:20:10 AM5/7/02
to
Uz.ytkownik Padraig Brady napisa?:

> Am I going to have to parse hdparm output?
> ....
> geometry = 2434/255/63, sectors = 39102336, start = 0
>
> Am I going to need hdparm on my embedded system?

Yes. Or just fsck hardcode the defaults.

Anton Altaparmakov

unread,
May 7, 2002, 10:40:07 AM5/7/02
to
At 15:08 07/05/02, Dave Jones wrote:
>On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
>Bartlomiej Zolnierkiewicz moved all this stuff to userspace
>a long time ago in 'ideinfo'.

[aia21@drop hda]$ ideinfo
bash: ideinfo: command not found

Obviously distros haven't caught up with this development. )-:

Care to give me a URL? A quick google for "ideinfo Linux download" didn't
bring up anything looking relevant.

> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
>Why would a normal user ever need to know this info?

I want to know this info. (-: Admittedly normal users don't need it... It
is useful for diagnosing problems with NTFS and MD setups for example (in
conjunction with fdisk -l shown in sectors).

> > And as I said, I can understand removing the ability to write values into
> > /proc/ide/*, what I disagree with is the removal of the information
> > provided by read-only access to /proc/ide/*. And that is because I am not
> > aware of any other way to get the same information.
>
>The parsing gunk we have for /proc/ide is fugly, and should have been
>done with sysctls from day one imo.

I like text parsing... It is not performance critical and makes info human
readable... Whether existing text parsers are any good or not, I don't
care, write a better one if you don't like the existing one or go beat up
the people who wrote the bad ones... That seems to be Martin's standard
reply, so I thought I would use it, too. (-;

Best regards,

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

-

Padraig Brady

unread,
May 7, 2002, 10:40:08 AM5/7/02
to
Martin Dalecki wrote:
> Uz.ytkownik Padraig Brady napisa?:
>
>> Am I going to have to parse hdparm output?
>> ....
>> geometry = 2434/255/63, sectors = 39102336, start = 0
>>
>> Am I going to need hdparm on my embedded system?
>
>
> Yes. Or just fsck hardcode the defaults.
>

hardcode defaults?

Also are the following standard RH7.1 programs going to
need changing?

[padraig@pixelbeat padraig]$ find /sbin /usr/sbin /bin /usr/bin /lib
/usr/lib /usr/bin/X11/ -xdev -perm +111 | xargs grep -l /proc/ide
2>/dev/null
/sbin/mkinitrd
/sbin/fdisk
/sbin/sfdisk
/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/Xconfigurator

Padraig.

Martin Dalecki

unread,
May 7, 2002, 10:50:05 AM5/7/02
to
Uz.ytkownik Anton Altaparmakov napisa?:

>
> [aia21@drop hda]$ ideinfo
> bash: ideinfo: command not found
>
> Obviously distros haven't caught up with this development. )-:
>
> Care to give me a URL? A quick google for "ideinfo Linux download"
> didn't bring up anything looking relevant.

http://www.j2.ru/frozenfido/ru.unix.bsd/1329707b3e3f8.html

Porting it should be fairly tirvial. Basically lspci +
the parsing crap.

>
> I like text parsing... It is not performance critical and makes info
> human readable... Whether existing text parsers are any good or not, I
> don't care, write a better one if you don't like the existing one or go
> beat up the people who wrote the bad ones... That seems to be Martin's
> standard reply, so I thought I would use it, too. (-;

Feel free to do it yourself - in user space where it belongs.

Anton Altaparmakov

unread,
May 7, 2002, 11:20:04 AM5/7/02
to
At 14:15 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Padraig Brady napisa?:
>>Am I going to have to parse hdparm output?
>>....
>> geometry = 2434/255/63, sectors = 39102336, start = 0
>>Am I going to need hdparm on my embedded system?
>
>Yes. Or just fsck hardcode the defaults.

This is stupid! And if that isn't obvious to you, you should think a bit
more carefully...

Linux's power is exactly that it can be used on anything from a wristwatch
to a huge server and that it is flexible about everything. You are breaking
this flexibility for no apparent reason. (I don't accept "I can't cope with
this so I remove it." as a reason, sorry).

As the new IDE maintainer so far we have only seen you removing one feature
after the other in the name of cleanup, without adequate or even any at
all(!) replacements, renaming all functions to hell and back, and breaking
the ide core here there and everywhere. All critical bug fixes seem to have
been contributed by other people looking at your code which doesn't inspire
a lot of confidence in you... Even Alan Cox said a while ago that you have
his vote of no confidence (probably slightly rephrased here) because of
changes you were introducing and I tend to trust bearded kernel hackers
from Whales. (-;

Aren't you noticing that something is wrong here???

Best regards,

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

-

Padraig Brady

unread,
May 7, 2002, 11:20:06 AM5/7/02
to
Dave Jones wrote:
> On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
> Bartlomiej Zolnierkiewicz moved all this stuff to userspace
> a long time ago in 'ideinfo'.
>
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
> Why would a normal user ever need to know this info?

Well one application we have here is a backup script in a web
interface (php running as nobody), which copies a whole disk
(compact flash) to the client while indicating the total size
to the client for feedback:

Header("Content-type: application/octet-stream");
$flash_size=`cat /proc/ide/hda/capacity`;
$flash_size=$flash_size*512;
Header("Content-length: $flash_size");
Header("Content-Disposition: attachment; filename=flash.img");
passthru("/bin/suid_copy_flash");

Now you could of course have a /bin/suid_get_flash_size
but this is messy/less efficient?

Padraig.

Anton Altaparmakov

unread,
May 7, 2002, 11:20:05 AM5/7/02
to
At 14:36 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Anton Altaparmakov napisa?:
>>[aia21@drop hda]$ ideinfo
>>bash: ideinfo: command not found
>>Obviously distros haven't caught up with this development. )-:
>>Care to give me a URL? A quick google for "ideinfo Linux download" didn't
>>bring up anything looking relevant.
>
>http://www.j2.ru/frozenfido/ru.unix.bsd/1329707b3e3f8.html
>
>Porting it should be fairly tirvial. Basically lspci +
>the parsing crap.

I don't want to port anything. I don't know ide and I don't want to know
ide. I want to be able to use it. I am an ide USER. You are the ide
DEVELOPER. If you take away functionality YOU have to provide a
replacement. NOT tell me, the USER to write it.

>>I like text parsing... It is not performance critical and makes info
>>human readable... Whether existing text parsers are any good or not, I
>>don't care, write a better one if you don't like the existing one or go
>>beat up the people who wrote the bad ones... That seems to be Martin's
>>standard reply, so I thought I would use it, too. (-;
>
>Feel free to do it yourself - in user space where it belongs.

I don't want to do it myself. I want YOU to do it because YOU are taking
away the functionality that already exists.

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

-

Linus Torvalds

unread,
May 7, 2002, 11:40:08 AM5/7/02
to

[ First off: any IDE-only thing that doesn't work for SCSI or other disks
doesn't solve a generic problem, so the complaint that some generic
tools might use it is totally invalid. ]

On Tue, 7 May 2002, Anton Altaparmakov wrote:
>
> Linux's power is exactly that it can be used on anything from a wristwatch
> to a huge server and that it is flexible about everything. You are breaking
> this flexibility for no apparent reason. (I don't accept "I can't cope with
> this so I remove it." as a reason, sorry).

Run the 57 patch, and complain if something doesn't work.

Linux's power is that we FIX stuff. That we make it the best system
possible, and that we don't just whine and argue about things.

> As the new IDE maintainer so far we have only seen you removing one
> feature after the other in the name of cleanup, without adequate or even
> any at all(!) replacements,

Who cares? Have you found _anything_ that Martin removed that was at all
worthwhile? I sure haven't.

Guys, you have to realize that the IDE layer has eight YEARS of absolute
crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
so distasteful that t's scary.

Take it from me: it's a _lot_ easier to add cruft and crap on top of clean
code. You can do it yourself if you want to. You don't need a maintainer
to add barnacles.

All the information that /proc/ide gave you is basically available in
hdparm, and for your dear embedded system it apparently takes up less
space by being in user space. So what is the problem?

My vote is to remove as much as humanly possible.

"Everything should be made as simple as possible, but not
simpler" - Albert Einstein

Think about it, and really _understand_ it.

Linus

Martin Dalecki

unread,
May 7, 2002, 12:10:09 PM5/7/02
to
Tue May 7 14:28:47 CEST 2002 ide-clean-58

- Apply m68k fixes by Roman Zippel.

- Apply CDROM PIO mode fix by Osamu Tamita.
(You are true "Hawk-eye" hovering over my head! Respect - and many Thanks.)

- Virtualize the udma_enable method as well to help ARM and PPC people. Please
please if you would like to have some other methods virtualized in a similar
way - just tell me or even better do it yourself at the end of ide-dma.c.
I *don't mind* patches.

- Fix the pmac code to adhere to the new API. It's supposed to work again.
However this is blind coding... I give myself 80% chances for it to work ;-).

ide-clean-58.diff

Jan Harkes

unread,
May 7, 2002, 12:30:06 PM5/7/02
to
On Tue, May 07, 2002 at 08:36:54AM -0700, Linus Torvalds wrote:
> On Tue, 7 May 2002, Anton Altaparmakov wrote:
> > As the new IDE maintainer so far we have only seen you removing one
> > feature after the other in the name of cleanup, without adequate or even
> > any at all(!) replacements,
>
> Who cares? Have you found _anything_ that Martin removed that was at all
> worthwhile? I sure haven't.

I'm still hoping a patch will show up that will allow me to regain
access to my compactflash cards and IBM microdrive disks. The code
currently doesn't rescan for new drives when a card has been inserted,
although it still seems to have all the necessary logic.

Jan

Martin Dalecki

unread,
May 7, 2002, 12:40:04 PM5/7/02
to
Uz.ytkownik Jan Harkes napisa?:

> On Tue, May 07, 2002 at 08:36:54AM -0700, Linus Torvalds wrote:
>
>>On Tue, 7 May 2002, Anton Altaparmakov wrote:
>>
>>>As the new IDE maintainer so far we have only seen you removing one
>>>feature after the other in the name of cleanup, without adequate or even
>>>any at all(!) replacements,
>>
>>Who cares? Have you found _anything_ that Martin removed that was at all
>>worthwhile? I sure haven't.
>
>
> I'm still hoping a patch will show up that will allow me to regain
> access to my compactflash cards and IBM microdrive disks. The code
> currently doesn't rescan for new drives when a card has been inserted,
> although it still seems to have all the necessary logic.
>

Yes I'm fully aware of this, but the whole initialization
is currently much in flux and I will return to this issue back
if I think that things are in shape there. OK?

Padraig Brady

unread,
May 7, 2002, 12:40:06 PM5/7/02
to

Well my "dear" embedded system doesn't have libc :-(
So 35664 saved in kernel (less on disk), requires 25212
extra for hdparm + more for static linked uclibc (hope
it works ;-)). As a side note if this happens hdparm would
be a requirement for busybox IMHO, anyway getting back on topic...

All the info I've ever needed is /proc/ide/hdx/capacity
which I could get from /proc/partitions with more a bit
more effort, so I vote for removing /proc/ide.

I think everyone realises Martin is doing great and much needed work
on IDE (btw I'll have those flash support patches soon Martin ;-)),
but I did think this change needed debate. In general I know it's a
hard decision what to export in proc, especially if there are
existing dependencies, a few already mentioned possibles in RH7.1:

/sbin/mkinitrd
/sbin/fdisk
/sbin/sfdisk
/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/Xconfigurator

For e.g. could the same arguments could be made for lspci only
interface to pci info rather than /proc/bus/pci? The following
references are made to /proc/bus/pci on my system:

/sbin/lspci
/sbin/setpci


/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig

/usr/sbin/adsl-config
/usr/sbin/internet-config
/usr/sbin/isdn-config
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/XFree86
/usr/bin/X11/pcitweak
/usr/bin/X11/scanpci
/usr/bin/X11/Xconfigurator

cheers,
Padraig.

Alan Cox

unread,
May 7, 2002, 1:00:06 PM5/7/02
to
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.

/proc/ide has useful information in it that you can't get easily by
other means at the moment - which controller is driving the disks, what
devices are present etc.

> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:

lspci relies on /proc/bus/pci - its the only part of the universe that
actually knows how to handle PCI and virtualised PCI devices. Unlike the
older /proc/pci interface it keeps all the complex gunk out of the kernel

Linus Torvalds

unread,
May 7, 2002, 1:00:10 PM5/7/02
to

On Tue, 7 May 2002, Padraig Brady wrote:
>
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.

Note that one thing that we might do is to leave the remnants of /proc/ide
but _without_ the very verbose per-chipset reporting.

At least to me it looks like it's all the chipset reporting that causes
the huge kernel bloat, and it shouldn't be impossible to reinstate a
minimal /proc/ide without those parts - while still keeping most of the
backwards compatibility.

However, since I really don't much like the idea of having special
"ide-only" /proc files, I personally think any information people actually
used should be either in truly generic files (/proc/partitions as an
example), _or_ they should be in the generic device tree (talk to Pat
Mochel about that).

So my personal reaction to removal of /proc/ide is: "good riddance, but if
it turns out that we seriously need it for backwards compatibility, we can
add back a skeleton without the bloat".

(Side note: I'm afraid that don't think backwards compatibility weighs
very heavily on an embedded setup - I'm more thinking about things like "a
regular RedHat/SuSE/Debian/whatever install won't work any more".)

As to existing binaries (your list is interesting), I don't see what they
are doing about ide-specific stuff, since I sure hope those binaries are
happy with a SCSI-only system.

> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:

I personally do like ASCII /proc files, as long as they don't add
maintainability problems etc.

Linus

Dave Jones

unread,
May 7, 2002, 1:00:11 PM5/7/02
to
On Tue, May 07, 2002 at 03:29:28PM +0100, Anton Altaparmakov wrote:
> [aia21@drop hda]$ ideinfo
> bash: ideinfo: command not found
> Obviously distros haven't caught up with this development. )-:
> Care to give me a URL? A quick google for "ideinfo Linux download" didn't
> bring up anything looking relevant.

Can't find where I got it from, and it seems to have fallen off google.
I put up the last version I had (which I hacked up a bit) at
http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz

> >The parsing gunk we have for /proc/ide is fugly, and should have been
> >done with sysctls from day one imo.
>
> I like text parsing.

must.. resist.. /proc ascii/bin... holywar..
(besides, sysctl interface gives you ascii in /proc/sys/)

> It is not performance critical and makes info human
> readable... Whether existing text parsers are any good or not, I don't
> care, write a better one if you don't like the existing one

That's likely exactly the reason we ended up with the dungheap we have
now. Rewriting the parser when we already have a usable sysctl interface
seems to have no gain over the existing mess to me.

Dave.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

Linus Torvalds

unread,
May 7, 2002, 1:10:08 PM5/7/02
to

On Tue, 7 May 2002, Alan Cox wrote:
>
> /proc/ide has useful information in it that you can't get easily by
> other means at the moment - which controller is driving the disks, what
> devices are present etc.

I'd love for somebody to add the devices to the real device tree, at which
point this kind of information would be very much visible..

Right now devicefs isn't even mounted by default, but it's the only
_really_ generic way of showing things like this that we have. For people
who haven't seen it before, do a

mount -t driverfs /devfs /devfs

and go look in there.. In particular, if you have a PCI system with a USB
device tree (or _multiple_ such trees), notice how you can look at things
like

/driverfs/root/pci0/00:1f.4/usb_bus/000/

and it wouldn't be impossible (or even necessarily very hard) to make an
IDE controller export the "IDE device tree" the same way a USB controller
now exports the "USB device tree".

For things like hotplug etc, I think driverfs is eventually the only way
to go, simply because it gives you the full (and unambiguous) path to
_any_ device, and is completely bus-agnostic.

But there is definitely a potential backwards-compatibility-issue.

Linus

Richard B. Johnson

unread,
May 7, 2002, 1:10:10 PM5/7/02
to

Link your embeded stuff against a stripped-down shared libc...

-rwxr-xr-x 1 root root 876 Apr 26 13:08 crt1.o
-rwxr-xr-x 1 root root 160824 Feb 25 13:30 ld-linux.so.2
-rwxr-xr-x 1 root root 160824 Apr 30 11:31 ld.so
-rwxr-xr-x 1 root root 2376745 Feb 25 13:29 libc.so.6
-rwxr-xr-x 1 root root 368551 Feb 25 13:29 libm.so.6

This does most everything an embedded system needs. You can extract
the objects from a shared object file (copy), remove the ones you
obviously don't need, make a new shared object file and link. Keep
adding objects until you don't have a any more unresolved symbols.

`ld` allows you to link to whatever you need. I put my special
'libc' plus another private shared library in /opt/lib. On the
target machine, /opt/lib is a sym-link to /lib.

LPATH=/opt/lib
ELINK=-rpath-link $(LPATH) \
-rpath $(LPATH) \
-L $(LPATH) -m elf_i386 \
-dynamic-linker \
$(LPATH)/ld-linux.so.2 \
$(LPATH)/crt1.o \
$(LPATH)/crtendS.o \
$(LPATH)/libc.so.6 \
$(LPATH)/libm.so.6


program: program.o
ld -o program program.o $(ELINK)


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

be...@kernel.crashing.org

unread,
May 7, 2002, 1:30:10 PM5/7/02
to
> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>
>and it wouldn't be impossible (or even necessarily very hard) to make an
>IDE controller export the "IDE device tree" the same way a USB controller
>now exports the "USB device tree".
>
>For things like hotplug etc, I think driverfs is eventually the only way
>to go, simply because it gives you the full (and unambiguous) path to
>_any_ device, and is completely bus-agnostic.
>
>But there is definitely a potential backwards-compatibility-issue.

One interesting thing here would be to have some optional link between
the bus-oriented device tree and the function-oriented tree (ie. devfs
or simply /dev). For example, an IDE node in driverfs could eventually
hold symlinks to the entries it provides in /dev when using devfs (or
just provide major/minor when not using devfs).

What do you think ?

One problem I've been faced with on ppc is to be able to match
a linux device with what the firmware (Open Firmware) thinks that
device is. The firmware view is bus-centered and it would be pretty
easy to provide some additional entries in driverfs that give the
OF fullpath of a given device. But then, the link between the actual
driver in driverfs and the "device" as used by, for example, the
filesystem isn't trivial.

Ben.

Andre Hedrick

unread,
May 7, 2002, 1:30:13 PM5/7/02
to

vaio:~ # hdparm -i /dev/hda

/dev/hda:

Model=FUJITSU MHJ2181AT, FwRev=D034, SerialNo=01001697
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=512kB, MaxMultSect=16, MultSect=16
CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=35433216
IORDY=yes, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4
Drive Supports : ATA-2 ATA-3 ATA-4 ATA-5
Kernel Drive Geometry LogicalCHS=2205/255/63 PhysicalCHS=37495/15/63

BS Dave it does parse the difference nicely

Andre Hedrick
LAD Storage Consulting Group

Linus Torvalds

unread,
May 7, 2002, 1:30:14 PM5/7/02
to

On Tue, 7 May 2002 be...@kernel.crashing.org wrote:
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev).

There isn't any 1:1 thing - the device/bus-oriented one should _not_ show
virtual things like partitions etc that have no relevance for a driver,
while /dev (and thus devfs) obviously think that that is the important
part, much more important than how we actually got to the device.

I think we need to have some way of getting a mapping from /dev ->
devicefs, but I don't think that has to be a filesystem thing (it might
even be as simple as just one ioctl or new system call: 'get the "path" of
this device').

There aren't that many people who actually care, I suspect.

Linus

Jauder Ho

unread,
May 7, 2002, 1:30:15 PM5/7/02
to

Ben, what you are proposing is fairly similar to what Solaris does today.
There is a /devices directory that contains the real path while /dev
contains the legacy stuff. Seems to work fine and given the proper docs,
you can decipher what the /devices path points to fairly easily. So I
certainly wouldnt mind seeing this happen for Linux eventually.

--Jauder

On Tue, 7 May 2002 be...@kernel.crashing.org wrote:

be...@kernel.crashing.org

unread,
May 7, 2002, 1:40:05 PM5/7/02
to
>
>
>On Tue, 7 May 2002 be...@kernel.crashing.org wrote:
>>
>> One interesting thing here would be to have some optional link between
>> the bus-oriented device tree and the function-oriented tree (ie. devfs
>> or simply /dev).
>
>There isn't any 1:1 thing - the device/bus-oriented one should _not_ show
>virtual things like partitions etc that have no relevance for a driver,
>while /dev (and thus devfs) obviously think that that is the important
>part, much more important than how we actually got to the device.
>
>I think we need to have some way of getting a mapping from /dev ->
>devicefs, but I don't think that has to be a filesystem thing (it might
>even be as simple as just one ioctl or new system call: 'get the "path" of
>this device').
>
>There aren't that many people who actually care, I suspect.

Sure, It's obviously not 1:1, what I had in mind was for the controller
to show what devices it exports in the sense of raw devices, but I agree
the other way makes a lot more sense. My problem was how to be devfs
agnostic, but you answered with "ioctl or syscall" and that would indeed
be ok. The ioctl things make it appliable to network interfaces as well,
which is good.

The need to do this link from a /dev to the driverfs, I suspect, will exist
only for case like setting up the firmware, though I can imagine one may
want to tweak some IDE settings (available via driverfs in your proposed
scheme) knowing only the /dev node.

Ben.

Richard Gooch

unread,
May 7, 2002, 1:50:08 PM5/7/02
to
Linus Torvalds writes:
> On Tue, 7 May 2002 be...@kernel.crashing.org wrote:
> >
> > One interesting thing here would be to have some optional link between
> > the bus-oriented device tree and the function-oriented tree (ie. devfs
> > or simply /dev).
>
> There isn't any 1:1 thing - the device/bus-oriented one should _not_
> show virtual things like partitions etc that have no relevance for a
> driver, while /dev (and thus devfs) obviously think that that is the
> important part, much more important than how we actually got to the
> device.

Actually, I've always said that I think devfs should care about both
views. And that's why I think putting the driver tree (ala driverfs)
in devfs, and making the device-oriented part of the tree be symlinks
into the bus-oriented tree, is a good idea.

> I think we need to have some way of getting a mapping from /dev ->
> devicefs, but I don't think that has to be a filesystem thing (it
> might even be as simple as just one ioctl or new system call: 'get
> the "path" of this device').

Fugly. What's wrong with readlink(2) as this "magic syscall"?

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Linus Torvalds

unread,
May 7, 2002, 2:10:10 PM5/7/02
to

On Tue, 7 May 2002, Richard Gooch wrote:
>
> Actually, I've always said that I think devfs should care about both
> views.

And I think you're completely wrong.

The fact is, they are two completely different and orthogonal things, and
they have _nothing_ in common except for a very weak linkage of actual
"physical device" (which does not always exist).

The set of people that cares about one view is almost 100% different from
the set of people that care about the other view.

> Fugly. What's wrong with readlink(2) as this "magic syscall"?

Ehh - like the fact that it doesn't work on device files?

Linus

Alan Cox

unread,
May 7, 2002, 2:20:06 PM5/7/02
to
> > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> Ehh - like the fact that it doesn't work on device files?

I can't find anything in Posix/SuS that says it isnt allowed to however 8)

Linus Torvalds

unread,
May 7, 2002, 2:20:07 PM5/7/02
to

On Tue, 7 May 2002, Alan Cox wrote:
>
> > > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> > Ehh - like the fact that it doesn't work on device files?
>
> I can't find anything in Posix/SuS that says it isnt allowed to however 8)

We can certainly do it, it just doesn't buy us much of anything, since
none of the standard tools (ie "ls") will actually do the readlink() for
anything but a symlink.

So at that point it's just another magic syscall, except we've overloaded
an old one.

Which may certainly be acceptable, of course.

Linus

Patrick Mochel

unread,
May 7, 2002, 2:40:07 PM5/7/02
to

On Tue, 7 May 2002 be...@kernel.crashing.org wrote:

> > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> >
> >and it wouldn't be impossible (or even necessarily very hard) to make an
> >IDE controller export the "IDE device tree" the same way a USB controller
> >now exports the "USB device tree".
> >
> >For things like hotplug etc, I think driverfs is eventually the only way
> >to go, simply because it gives you the full (and unambiguous) path to
> >_any_ device, and is completely bus-agnostic.
> >
> >But there is definitely a potential backwards-compatibility-issue.
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev). For example, an IDE node in driverfs could eventually
> hold symlinks to the entries it provides in /dev when using devfs (or
> just provide major/minor when not using devfs).

I agree with such a concept, but as Linus said, it should go the other
way, from the functional interface to physical interface. There are many
details involved in doing such a thing, but it should work something like
this:

The logical subystems (ide disks, networking, etc) would register with the
device model core and get a directory in driverfs:

/driverfs/class/ide/

Devices would be discovered and get a driverfs directory representing the
physical location of the device:

/driverfs/root/pci0/07.2/

Note that no drivers have been bound to the device. When the driver is
bound, it registers the device with the subsystem, passing in a
subsystem-specific structure. These can be made to point in some way to
the generic struct device of the device (from which the physical path can
be inferred).

When this happens, the subsystem creates a directory underneath its
driverfs directory, so you get:

/driverfs/class/ide/0/

And, a symlink is created to point to the directory in the physical path.
As the driver discovers partitions on the device, it can create special
nodes in its class directory.

At this point, userspace can be notified (via /sbin/hotplug). That can
create symlinks in /dev to the nodes that were just created, emulating
current /dev behavior.

So, what does this do? To an extent, it reengineers the funtionality of
devfs. I'll be the first to admit it. However, it centers less around the
filesystem, and more on the device model core.

Most devices already register with their subsystems, so having the
subsystesm pass device info onto the core is relatively easy.

As partitions are discovered, you get paths like:

/driverfs/class/ide/0/2

Which gives you a default name for the device. With /sbin/hotplug, simple
userspace policy, and symlinks in /dev, you can emulate the current device
hierarchy. So, you get a device naming solution that gives you only the
device names for the devices you have.

This approach also de-emphasizes the dependency on major and minor
numbers. If device nodes are created in kernel space initially, userspace
doesn't need to know what the major/minor is for a particular device. The
symlink to the device node is all that's need to operate on the device.

Without the need to coordinate between kernel and userspace, at least some
majors/minors can be dynamically allocated as the subsystems and devices
are registered with the core. (These can then be exported via files in
driverfs). (This is similar to the dynamic allocation of minor numbers in
the USB subsystem that showed up recently...)

Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.

Thoughts? Comments? Flames?

-pat

Richard Gooch

unread,
May 7, 2002, 2:50:04 PM5/7/02
to
Linus Torvalds writes:
>
>
> On Tue, 7 May 2002, Alan Cox wrote:
> >
> > > > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> > > Ehh - like the fact that it doesn't work on device files?
> >
> > I can't find anything in Posix/SuS that says it isnt allowed to however 8)
>
> We can certainly do it, it just doesn't buy us much of anything, since
> none of the standard tools (ie "ls") will actually do the readlink() for
> anything but a symlink.
>
> So at that point it's just another magic syscall, except we've overloaded
> an old one.
>
> Which may certainly be acceptable, of course.

I wasn't suggesting a magic readlink(2). I was suggesting a *real*
one. Device nodes get stored in the physical tree (what you call
driverfs), and the entries in the logical tree are symlinks. Such as:

/dev/scsi/host0 symlink to /dev/bus/pci0/slot1/function2

or something like that. Easy to implement, easy to understand, easy to
manage.

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Richard Gooch

unread,
May 7, 2002, 2:50:09 PM5/7/02
to
Patrick Mochel writes:
> Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.

The size argument is not an issue. I've already said that devfs will
shrink a lot once I move tree management from my own code to the VFS.
At that point devfs will mostly be:
- an API
- a way fo supporting the devfsd protocol.

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Linus Torvalds

unread,
May 7, 2002, 2:50:10 PM5/7/02
to

On Tue, 7 May 2002, Richard Gooch wrote:
> > Which may certainly be acceptable, of course.
>
> I wasn't suggesting a magic readlink(2). I was suggesting a *real*
> one. Device nodes get stored in the physical tree (what you call
> driverfs), and the entries in the logical tree are symlinks.

NO.

This is one backwards compatibility thing that I'm _not_ removing.

We have tons of existign /dev trees, and I'm not making them into
symlinks.

Also, you obviously haven't thought it through AT ALL. Hint: partitions.

If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
because on a physical level that partition DOES NOT EXIST. It's purely a
virtual mapping.

Yet clearly there _is_ a mapping from /dev/hda1 onto the physical device
in question, and clearly it _is_ a meaninful operation to operate on the
physical device underlying /dev/hda1.

So if you want to have a sane interface, you need to have a way to look up
the physical device that underlies /dev/hda1.

Yet it clearly cannot be a symlink.

QED.

So stop mixing up physical devices and /dev. They should NOT be handled by
the same mechanism.

Linus

Patrick Mochel

unread,
May 7, 2002, 3:00:06 PM5/7/02
to

On Tue, 7 May 2002, Richard Gooch wrote:

> Patrick Mochel writes:
> > Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
>
> The size argument is not an issue. I've already said that devfs will
> shrink a lot once I move tree management from my own code to the VFS.

I agree 100%. However, I think that move will be very painful. I tried to
do it a couple of months ago, and there were so many interdependencies and
oddities that I gave up after about 6 hours.

> At that point devfs will mostly be:
> - an API
> - a way fo supporting the devfsd protocol.

I argue that you shouldn't need a separate daemon. We already have the
/sbin/hotplug interface. It's simple and sweet. We shouldn't need to rely
on an entirely separate daemon.

-pat

Thunder from the hill

unread,
May 7, 2002, 3:00:08 PM5/7/02
to
Hi,

> > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> /driverfs/class/ide/
> /driverfs/root/pci0/07.2/
> /driverfs/class/ide/0/
> /driverfs/class/ide/0/2

Why not fixing devfs for that? My root directory is messed up enough. We
have dev, proc, tmp, ...
We might have /dev/driver or such, which doesn't make the root directory
any fuller. (And also not to disturb the newbies any further. It's hard a
lot to explain to a windows user why he can't remove /proc and /dev, and
what this is supposed to be.)
This is just my opinion...

Regards,
Thunder
--
if (errno == ENOTAVAIL)
fprintf(stderr, "Error: Talking to Microsoft server!\n");

Message has been deleted

Greg KH

unread,
May 7, 2002, 3:10:07 PM5/7/02
to
On Tue, May 07, 2002 at 11:29:10AM -0700, Patrick Mochel wrote:
>
> Which gives you a default name for the device. With /sbin/hotplug, simple
> userspace policy, and symlinks in /dev, you can emulate the current device
> hierarchy. So, you get a device naming solution that gives you only the
> device names for the devices you have.
>
> This approach also de-emphasizes the dependency on major and minor
> numbers. If device nodes are created in kernel space initially, userspace
> doesn't need to know what the major/minor is for a particular device. The
> symlink to the device node is all that's need to operate on the device.
>
> Without the need to coordinate between kernel and userspace, at least some
> majors/minors can be dynamically allocated as the subsystems and devices
> are registered with the core. (These can then be exported via files in
> driverfs). (This is similar to the dynamic allocation of minor numbers in
> the USB subsystem that showed up recently...)

And is exactly why this showed up in the USB subsystem :)

> Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.

And it removes the dependency of devfsd and its interface, replacing it
with the existing /sbin/hotplug interface. This allows different people
to implement different naming schemes if they so desire, moving naming
policy out of the kernel into userspace, where it belongs.

Yes, there will probably be a "default" naming scheme, matching what we
have today, but the ability to replace it with another one is _so_ much
easier than having to try to tie into devfsd (like the devreg
implementation does: http://www-124.ibm.com/devreg/ )


greg k-h

Richard Gooch

unread,
May 7, 2002, 3:30:09 PM5/7/02
to
Patrick Mochel writes:
>
> On Tue, 7 May 2002, Richard Gooch wrote:
>
> > Patrick Mochel writes:
> > > Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
> >
> > The size argument is not an issue. I've already said that devfs will
> > shrink a lot once I move tree management from my own code to the VFS.
>
> I agree 100%. However, I think that move will be very painful. I
> tried to do it a couple of months ago, and there were so many
> interdependencies and oddities that I gave up after about 6 hours.

Oh, it's certainly more that 6 hours of work. But it *will* get done.

> > At that point devfs will mostly be:
> > - an API
> > - a way fo supporting the devfsd protocol.
>
> I argue that you shouldn't need a separate daemon. We already have
> the /sbin/hotplug interface. It's simple and sweet. We shouldn't
> need to rely on an entirely separate daemon.

The devfsd protocol is more lightweight. Plus it doesn't require
fork(2)+execve(2) overheads. And more importantly, you can capture
lookup() events.

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Patrick Mochel

unread,
May 7, 2002, 4:00:08 PM5/7/02
to

On Tue, 7 May 2002, Thunder from the hill wrote:

> Hi,
>
> > > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> > /driverfs/class/ide/
> > /driverfs/root/pci0/07.2/
> > /driverfs/class/ide/0/
> > /driverfs/class/ide/0/2
>
> Why not fixing devfs for that? My root directory is messed up enough. We
> have dev, proc, tmp, ...

For one, I am of the camp that believes devfs is unfixable.

For two, where driverfs is mounted is irrelevant: /driverfs, /sys,
/proc/bus are all valid places.

Besides, who cares what's in /? You have /home, which is all that really
matters, no?

> We might have /dev/driver or such, which doesn't make the root directory
> any fuller. (And also not to disturb the newbies any further. It's hard a
> lot to explain to a windows user why he can't remove /proc and /dev, and
> what this is supposed to be.)

So don't give them root access. Or, explain to them that they're magic,
like the pagefile.sys file. :)

-pat

Patrick Mochel

unread,
May 7, 2002, 4:10:05 PM5/7/02
to

> Oh, it's certainly more that 6 hours of work. But it *will* get done.

Even the mtrr driver was a good 8 hours to clean up, make readable and
more object-oriented. I wish you luck, as well as anyone that has to
attempt to decipher it.

> > > At that point devfs will mostly be:
> > > - an API
> > > - a way fo supporting the devfsd protocol.
> >
> > I argue that you shouldn't need a separate daemon. We already have
> > the /sbin/hotplug interface. It's simple and sweet. We shouldn't
> > need to rely on an entirely separate daemon.
>
> The devfsd protocol is more lightweight. Plus it doesn't require
> fork(2)+execve(2) overheads. And more importantly, you can capture
> lookup() events.

These events are not performance critical, so the overhead is less
important. Besides, almost all systems have /sbin/hotplug, since it can be
anything - a shell script, a perl script, a tiny C executable.

The hotplug interface doesn't rely on any particular implementation. It
only relies on something on the other side implementing a particular
interface. The implementation can be replaced, as well as the format of
the policy, based on the constratints of the system or the whims of
the distro.

It also doesn't rely on a process running to capture events. What happens
if the devfsd process is killed?

-pat

Jan Harkes

unread,
May 7, 2002, 5:40:08 PM5/7/02
to
On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> Uz.ytkownik Jan Harkes napisa?:
> >I'm still hoping a patch will show up that will allow me to regain
> >access to my compactflash cards and IBM microdrive disks. The code
> >currently doesn't rescan for new drives when a card has been inserted,
> >although it still seems to have all the necessary logic.
>
> Yes I'm fully aware of this, but the whole initialization
> is currently much in flux and I will return to this issue back
> if I think that things are in shape there. OK?

I thought so, you already indicated so around the time that it broke.
There is still a 2.4 kernel when I really need to get to the data.

Jan

Richard Gooch

unread,
May 7, 2002, 6:10:07 PM5/7/02
to
Patrick Mochel writes:
>
> On Tue, 7 May 2002, Thunder from the hill wrote:
>
> > Hi,
> >
> > > > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> > > /driverfs/class/ide/
> > > /driverfs/root/pci0/07.2/
> > > /driverfs/class/ide/0/
> > > /driverfs/class/ide/0/2
> >
> > Why not fixing devfs for that? My root directory is messed up enough. We
> > have dev, proc, tmp, ...
>
> For one, I am of the camp that believes devfs is unfixable.

But it's not actually broken, now that the locking is fixed.

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Roman Zippel

unread,
May 7, 2002, 8:00:10 PM5/7/02
to
Hi,

On Tue, 7 May 2002, Linus Torvalds wrote:

> Also, you obviously haven't thought it through AT ALL. Hint: partitions.
>
> If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
> because on a physical level that partition DOES NOT EXIST. It's purely a
> virtual mapping.
>
> Yet clearly there _is_ a mapping from /dev/hda1 onto the physical device
> in question, and clearly it _is_ a meaninful operation to operate on the
> physical device underlying /dev/hda1.
>
> So if you want to have a sane interface, you need to have a way to look up
> the physical device that underlies /dev/hda1.
>
> Yet it clearly cannot be a symlink.
>
> QED.

Somehow I expect Al to step in with something like:

mount -t partfs /devfs/bus/... /dev/hda

:-)

bye, Roman

Guest section DW

unread,
May 7, 2002, 8:30:13 PM5/7/02
to
On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
> On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> > Uz.ytkownik Jan Harkes napisa?:
> > >I'm still hoping a patch will show up that will allow me to regain
> > >access to my compactflash cards and IBM microdrive disks. The code
> > >currently doesn't rescan for new drives when a card has been inserted,
> > >although it still seems to have all the necessary logic.
> >
> > Yes I'm fully aware of this, but the whole initialization
> > is currently much in flux and I will return to this issue back
> > if I think that things are in shape there. OK?
>
> I thought so, you already indicated so around the time that it broke.
> There is still a 2.4 kernel when I really need to get to the data.

I usually do

blockdev --rereadpt /dev/sde

or so. That still works for me with 2.5.13.

Andries

Jan Harkes

unread,
May 7, 2002, 11:10:04 PM5/7/02
to
On Wed, May 08, 2002 at 02:25:13AM +0200, Guest section DW wrote:
> On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
> > On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> > > Uz.ytkownik Jan Harkes napisa?:
> > > >I'm still hoping a patch will show up that will allow me to regain
> > > >access to my compactflash cards and IBM microdrive disks. The code
> > > >currently doesn't rescan for new drives when a card has been inserted,
> > > >although it still seems to have all the necessary logic.
> > >
> > > Yes I'm fully aware of this, but the whole initialization
> > > is currently much in flux and I will return to this issue back
> > > if I think that things are in shape there. OK?
> >
> > I thought so, you already indicated so around the time that it broke.
> > There is still a 2.4 kernel when I really need to get to the data.
>
> I usually do
>
> blockdev --rereadpt /dev/sde
>
> or so. That still works for me with 2.5.13.

For SCSI devices probably, but I get "/dev/hde: No such device" (ENODEV)
when a CF card is inserted and recognized.

(dmesg)
hde: SanDisk SDCFB-32, ATA DISK drive
ide2 at 0x100-0x107,0x10e on irq 3
ide_cs: hde: Vcc = 3.3, Vpp = 0.0

When the CF card is not inserted I get a subtly different error
"/dev/hde: No such device or address" (ENXIO).

It looks like the drive <-> driver association is only set up when the
ide-disk driver module is loaded and not when new hardware is found.

Jan

Anton Altaparmakov

unread,
May 7, 2002, 11:40:06 PM5/7/02
to
At 17:51 07/05/02, Dave Jones wrote:
>On Tue, May 07, 2002 at 03:29:28PM +0100, Anton Altaparmakov wrote:
> > [aia21@drop hda]$ ideinfo
> > bash: ideinfo: command not found
> > Obviously distros haven't caught up with this development. )-:
> > Care to give me a URL? A quick google for "ideinfo Linux download" didn't
> > bring up anything looking relevant.
>
>Can't find where I got it from, and it seems to have fallen off google.
>I put up the last version I had (which I hacked up a bit) at
>http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz

Ok, will get that. Someone else emailed me a url and I tried that earlier
on (ages ago it seems) it said version 0.0.4 and it displayed a lot of crap
on a 2.5.14 running kernel. Certainly it bears no resemblance to what
/proc/ide/via has to say and it certainly bears no resemblance to
reality... )-: i hope...

> > >The parsing gunk we have for /proc/ide is fugly, and should have been
> > >done with sysctls from day one imo.
> >

> > I like text parsing.
>
>must.. resist.. /proc ascii/bin... holywar..
>(besides, sysctl interface gives you ascii in /proc/sys/)

It does indeed (if implemented). Agreed if Martin were to change to sysctl
with /proc interface great, it would just mean /proc/ide becomes
/proc/sys/ide, nothing against that....

> > It is not performance critical and makes info human
> > readable... Whether existing text parsers are any good or not, I don't
> > care, write a better one if you don't like the existing one
>
>That's likely exactly the reason we ended up with the dungheap we have
>now. Rewriting the parser when we already have a usable sysctl interface
>seems to have no gain over the existing mess to me.

Probably... I agree sysctl is great. I use it in ntfs myself. (-: And i
think the /proc/sys is very nice... And for people who don't like it or who
don;'t compile /proc fs they can use _sysctl...

Cheers,

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Paul Mackerras

unread,
May 8, 2002, 2:50:06 AM5/8/02
to
Martin Dalecki writes:

> - Virtualize the udma_enable method as well to help ARM and PPC people. Please
> please if you would like to have some other methods virtualized in a similar
> way - just tell me or even better do it yourself at the end of ide-dma.c.
> I *don't mind* patches.
>
> - Fix the pmac code to adhere to the new API. It's supposed to work again.
> However this is blind coding... I give myself 80% chances for it to work ;-).

OK, now I am truly impressed. Not only does it compile cleanly, it
works first go!

I am using the tiny patch below, it sets the unmask flag so interrupts
will be unmasked by default (which is safe on powermacs).

Thanks,
Paul.

diff -urN linux-2.5/drivers/ide/ide-pmac.c pmac-2.5/drivers/ide/ide-pmac.c
--- linux-2.5/drivers/ide/ide-pmac.c Wed May 8 16:40:17 2002
+++ pmac-2.5/drivers/ide/ide-pmac.c Wed May 8 08:26:48 2002
@@ -343,6 +343,7 @@
ide_hwifs[ix].autodma = 1;
#endif
}
+ ide_hwifs[ix].unmask = 1;
}

#if 0

Message has been deleted

Juan Quintela

unread,
May 8, 2002, 3:50:08 AM5/8/02
to
>>>>> "linus" == Linus Torvalds <torv...@transmeta.com> writes:

Hi

linus> (Side note: I'm afraid that don't think backwards compatibility weighs
linus> very heavily on an embedded setup - I'm more thinking about things like "a
linus> regular RedHat/SuSE/Debian/whatever install won't work any more".)

here at Mandrake we have a patch for the install kernel to remove the
/proc/ide, and I think that we got it from redhat, that means that at
least two distros preffer to save ~25kb in the boot kernels than the
reporting that they do :p

Later, Juan.

--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy

Russell King

unread,
May 8, 2002, 4:20:09 AM5/8/02
to
On Tue, May 07, 2002 at 04:03:50PM -0600, Richard Gooch wrote:
> But it's not actually broken, now that the locking is fixed.

Really? What about the case of the missing BKL for device opens that
you haven't really commented on?

Seems like devfs _still_ has locking problems.

--
Russell King (r...@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

Martin Dalecki

unread,
May 8, 2002, 4:50:06 AM5/8/02
to
Uz.ytkownik Padraig Brady napisa?:
> Linus Torvalds wrote:
>
>> [ First off: any IDE-only thing that doesn't work for SCSI or other
>> disks
>> doesn't solve a generic problem, so the complaint that some generic
>> tools might use it is totally invalid. ]
>>
>> On Tue, 7 May 2002, Anton Altaparmakov wrote:
>>
>>> Linux's power is exactly that it can be used on anything from a
>>> wristwatch
>>> to a huge server and that it is flexible about everything. You are
>>> breaking
>>> this flexibility for no apparent reason. (I don't accept "I can't
>>> cope with
>>> this so I remove it." as a reason, sorry).
>>
>>
>>
>> Run the 57 patch, and complain if something doesn't work.
>>
>> Linux's power is that we FIX stuff. That we make it the best system
>> possible, and that we don't just whine and argue about things.
>>
>>
>>> As the new IDE maintainer so far we have only seen you removing one
>>> feature after the other in the name of cleanup, without adequate or even
>>> any at all(!) replacements,
>>
>>
>>
>> Who cares? Have you found _anything_ that Martin removed that was at all
>> worthwhile? I sure haven't.
>>
>> Guys, you have to realize that the IDE layer has eight YEARS of absolute
>> crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
>> so distasteful that t's scary.
>>
>> Take it from me: it's a _lot_ easier to add cruft and crap on top of
>> clean
>> code. You can do it yourself if you want to. You don't need a maintainer
>> to add barnacles.
>>
>> All the information that /proc/ide gave you is basically available in
>> hdparm, and for your dear embedded system it apparently takes up less
>> space by being in user space. So what is the problem?
>
>
> Well my "dear" embedded system doesn't have libc :-(
> So 35664 saved in kernel (less on disk), requires 25212
> extra for hdparm + more for static linked uclibc (hope
> it works ;-)). As a side note if this happens hdparm would
> be a requirement for busybox IMHO, anyway getting back on topic...
>
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.
>
> I think everyone realises Martin is doing great and much needed work
> on IDE (btw I'll have those flash support patches soon Martin ;-)),
> but I did think this change needed debate. In general I know it's a
> hard decision what to export in proc, especially if there are
> existing dependencies, a few already mentioned possibles in RH7.1:
>
> /sbin/mkinitrd
> /sbin/fdisk
> /sbin/sfdisk
> /sbin/sndconfig
> /usr/sbin/mouseconfig
> /usr/sbin/kudzu
> /usr/sbin/module_upgrade
> /usr/sbin/updfstab
> /usr/sbin/glidelink
> /usr/sbin/sndconfig
> /usr/lib/python1.5/site-packages/_kudzumodule.so
> /usr/bin/X11/Xconfigurator
>
> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:

In esp. in sigth of the fact that we have a device tree filesystem, I
rather think that /prco/bus/pci is obsolete indeed.

Martin Dalecki

unread,
May 8, 2002, 5:10:07 AM5/8/02
to
Uz.ytkownik Linus Torvalds napisa?:
>
> On Tue, 7 May 2002, Alan Cox wrote:
>
>>/proc/ide has useful information in it that you can't get easily by
>>other means at the moment - which controller is driving the disks, what
>>devices are present etc.
>
>
> I'd love for somebody to add the devices to the real device tree, at which
> point this kind of information would be very much visible..
>
> Right now devicefs isn't even mounted by default, but it's the only
> _really_ generic way of showing things like this that we have. For people
> who haven't seen it before, do a
>
> mount -t driverfs /devfs /devfs
>
> and go look in there.. In particular, if you have a PCI system with a USB
> device tree (or _multiple_ such trees), notice how you can look at things
> like

>
> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>
> and it wouldn't be impossible (or even necessarily very hard) to make an
> IDE controller export the "IDE device tree" the same way a USB controller
> now exports the "USB device tree".
>
> For things like hotplug etc, I think driverfs is eventually the only way
> to go, simply because it gives you the full (and unambiguous) path to
> _any_ device, and is completely bus-agnostic.
>
> But there is definitely a potential backwards-compatibility-issue.

Linus - there are no backward compatibility issues here.
No single application from my system does mess with /proc/ide.
They showed you a list of programs which use /proc and not a list
of programs which use anything out of /proc/ide...
RedHat even disables all this chip set specific reporting in theyr
public kernels. OK kudzu is using it, but it does not *rely on it*.
Heck kudzu is running all the time I rebooted my system during
developement and nothing ugly did happen.

for fdisk on my notebook, well it runs just fine:

[root@kozaczek root]# fdisk /dev/hda

The number of cylinders for this disk is set to 2584.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/hda: 240 heads, 63 sectors, 2584 cylinders
Units = cylinders of 15120 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 7 52888+ 83 Linux
/dev/hda2 8 2556 19270440 5 Extended
/dev/hda4 2557 2584 211680 a0 IBM Thinkpad hibernation
/dev/hda5 8 2166 16322008+ 83 Linux
/dev/hda6 2167 2219 400648+ 82 Linux swap

Neither the programmer who wrote fdisk or cdrecord or anything else
was stiupid enough to use anything out there, since using a
simple ioctl is easier anyway. I *did* check them.
(Admittedly I don't care about kudzu, but fdisk an friends I was
fully aware of.)

BTW. If one needs the size of the disk well we could
attach it as a file size to the device file in /dev IMHO. Why not?

Martin Dalecki

unread,
May 8, 2002, 5:20:05 AM5/8/02
to
Uz.ytkownik be...@kernel.crashing.org napisa?:

>> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>>
>>and it wouldn't be impossible (or even necessarily very hard) to make an
>>IDE controller export the "IDE device tree" the same way a USB controller
>>now exports the "USB device tree".
>>
>>For things like hotplug etc, I think driverfs is eventually the only way
>>to go, simply because it gives you the full (and unambiguous) path to
>>_any_ device, and is completely bus-agnostic.
>>
>>But there is definitely a potential backwards-compatibility-issue.
>
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev). For example, an IDE node in driverfs could eventually
> hold symlinks to the entries it provides in /dev when using devfs (or
> just provide major/minor when not using devfs).
>
> What do you think ?
>
> One problem I've been faced with on ppc is to be able to match
> a linux device with what the firmware (Open Firmware) thinks that
> device is. The firmware view is bus-centered and it would be pretty
> easy to provide some additional entries in driverfs that give the
> OF fullpath of a given device. But then, the link between the actual
> driver in driverfs and the "device" as used by, for example, the
> filesystem isn't trivial.
>
> Ben.
>
>
>

This is the "first" IDE controller on my notebook:

./devices/root/pci0/00:07.1/01f0
./devices/root/pci0/00:07.1/01f0/0
./devices/root/pci0/00:07.1/01f0/0/power
./devices/root/pci0/00:07.1/01f0/0/name
./devices/root/pci0/00:07.1/01f0/0/status
./devices/root/pci0/00:07.1/01f0/power
./devices/root/pci0/00:07.1/01f0/name
./devices/root/pci0/00:07.1/01f0/status

Guys I have done it already!

For your convenience I will attach the ata prefix to the
currently used port number in the next patch round.

OK?

Martin Dalecki

unread,
May 8, 2002, 5:20:08 AM5/8/02
to
Uz.ytkownik Jauder Ho napisa?:
> Ben, what you are proposing is fairly similar to what Solaris does today.
> There is a /devices directory that contains the real path while /dev
> contains the legacy stuff. Seems to work fine and given the proper docs,
> you can decipher what the /devices path points to fairly easily. So I
> certainly wouldnt mind seeing this happen for Linux eventually.

Amen, We would only have to add a device special file
to some of the /devices Stuff and /dev/ could be a symlink tree
pointing there...

I have *intentionally* named the standard mounting point
of the devicefs /devices the time I added the description
how to mount it to the driver-model.txt. The following words
are from *me*:

This can be done permanently by providing the following entry into the
/dev/fstab (under the provision that the mount point does exist, of course):

none /devices driverfs defaults 0 0

Or by hand on the command line:

~: mount -t driverfs none /devices

>
> --Jauder


>
> On Tue, 7 May 2002 be...@kernel.crashing.org wrote:
>
>

>>> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>>>
>>>and it wouldn't be impossible (or even necessarily very hard) to make an
>>>IDE controller export the "IDE device tree" the same way a USB controller
>>>now exports the "USB device tree".
>>>
>>>For things like hotplug etc, I think driverfs is eventually the only way
>>>to go, simply because it gives you the full (and unambiguous) path to
>>>_any_ device, and is completely bus-agnostic.
>>>
>>>But there is definitely a potential backwards-compatibility-issue.
>>
>>One interesting thing here would be to have some optional link between
>>the bus-oriented device tree and the function-oriented tree (ie. devfs
>>or simply /dev). For example, an IDE node in driverfs could eventually
>>hold symlinks to the entries it provides in /dev when using devfs (or
>>just provide major/minor when not using devfs).
>>
>>What do you think ?
>>
>>One problem I've been faced with on ppc is to be able to match
>>a linux device with what the firmware (Open Firmware) thinks that
>>device is. The firmware view is bus-centered and it would be pretty
>>easy to provide some additional entries in driverfs that give the
>>OF fullpath of a given device. But then, the link between the actual
>>driver in driverfs and the "device" as used by, for example, the
>>filesystem isn't trivial.
>>
>>Ben.
>>
>>
>>

>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majo...@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>
>
>

--
- phone: +49 214 8656 283
- job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R

Martin Dalecki

unread,
May 8, 2002, 5:30:11 AM5/8/02
to
Uz.ytkownik Patrick Mochel napisa?:

Just a side note...
Please please name it /devices/ Some old boys like me (age 30)
can gain from similarities with some quite common "legacy" systems.
We don't have to "invent" for the sake of it.

And /devices/ is the way I have named it in the corresponding
documentation.

Martin Dalecki

unread,
May 8, 2002, 5:30:13 AM5/8/02
to
Użytkownik Richard Gooch napisał:

> Linus Torvalds writes:
>
>>
>>On Tue, 7 May 2002, Alan Cox wrote:
>>
>>>>>Fugly. What's wrong with readlink(2) as this "magic syscall"?
>>>>
>>>>Ehh - like the fact that it doesn't work on device files?
>>>
>>>I can't find anything in Posix/SuS that says it isnt allowed to however 8)
>>
>>We can certainly do it, it just doesn't buy us much of anything, since
>>none of the standard tools (ie "ls") will actually do the readlink() for
>>anything but a symlink.
>>
>>So at that point it's just another magic syscall, except we've overloaded
>>an old one.
>>
>>Which may certainly be acceptable, of course.
>
>
> I wasn't suggesting a magic readlink(2). I was suggesting a *real*
> one. Device nodes get stored in the physical tree (what you call
> driverfs), and the entries in the logical tree are symlinks. Such as:
>
> /dev/scsi/host0 symlink to /dev/bus/pci0/slot1/function2
>
> or something like that. Easy to implement, easy to understand, easy to
> manage.

Now you take the last step toward solaris and realize why I was
always against your solution (no personal offence)
to the device management problem - they do it all in user space
by precisely the above symlink system....

Ian Molton

unread,
May 8, 2002, 5:40:05 AM5/8/02
to
On 08 May 2002 08:57:00 +0200
ka...@khms.westfalen.de (Kai Henningsen) wrote:

> /driverfs/root/pci0/00:1f.4/scsi_bus/003/pc_partition/2
>
> Sure, it's software, not hardware.

I agree. I think it'd be great to have something like the above.

Martin Dalecki

unread,
May 8, 2002, 6:00:12 AM5/8/02
to
Uz.ytkownik Paul Mackerras napisa?:

> Martin Dalecki writes:
>
>
>>- Virtualize the udma_enable method as well to help ARM and PPC people. Please
>> please if you would like to have some other methods virtualized in a similar
>> way - just tell me or even better do it yourself at the end of ide-dma.c.
>> I *don't mind* patches.
>>
>>- Fix the pmac code to adhere to the new API. It's supposed to work again.
>> However this is blind coding... I give myself 80% chances for it to work ;-).
>
>
> OK, now I am truly impressed. Not only does it compile cleanly, it
> works first go!

Thank you.

BTW> I would really love it if the cris architecture people could
"lend me" some small developement system for they interresting CPU.
In return I could give them what's certainly worth "several weeks of
developers time". (If you hear me: this is a hint if you need an argument for
your management.)

This unfortunately is the somehow most wired ATA interface
around. Which is due to the fact that the interface cell is directly mapped to
some CPU registers. As a CPU design I think it's a fine approach. Don't
take me wrong. You save yourself the whole silicon which is needed
for BM access arbitration and general handling and so on... Very nice tought
out. But on the software side this is a bit wired, since you can't use
the generic I/O primitives of the arch in question.

This makes my cleanup of the portability layer a bit hard
to finish on the software side.

> I am using the tiny patch below, it sets the unmask flag so interrupts
> will be unmasked by default (which is safe on powermacs).

And on every other fscking PCI based system... (modulo the "problematic"
cmd640 and RZ1000). Should have been done a long time ago this way... I will
adjust the others as well.

Martin Dalecki

unread,
May 8, 2002, 6:10:08 AM5/8/02
to
Uz.ytkownik Guest section DW napisa?:

> On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
>
>>On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
>>
>>>Uz.ytkownik Jan Harkes napisa?:
>>>
>>>>I'm still hoping a patch will show up that will allow me to regain
>>>>access to my compactflash cards and IBM microdrive disks. The code
>>>>currently doesn't rescan for new drives when a card has been inserted,
>>>>although it still seems to have all the necessary logic.
>>>
>>>Yes I'm fully aware of this, but the whole initialization
>>>is currently much in flux and I will return to this issue back
>>>if I think that things are in shape there. OK?
>>
>>I thought so, you already indicated so around the time that it broke.
>>There is still a 2.4 kernel when I really need to get to the data.
>
>
> I usually do
>
> blockdev --rereadpt /dev/sde
>
> or so. That still works for me with 2.5.13.


What you have to do by hand now is the rescanning for partition
information. What you do is triggering just that. And if I think
about it... and you know I'm evil... hmmm...
well why just don't let it be like that. It's functionally somehow the
responsibility of the /sbin/hotplug thing anyway...

Bjorn Wesen

unread,
May 8, 2002, 6:50:09 AM5/8/02
to
On Wed, 8 May 2002, Martin Dalecki wrote:
> BTW> I would really love it if the cris architecture people could
> "lend me" some small developement system for they interresting CPU.

We'll consider it :) However,

> This unfortunately is the somehow most wired ATA interface
> around. Which is due to the fact that the interface cell is directly mapped to
> some CPU registers. As a CPU design I think it's a fine approach. Don't
> take me wrong. You save yourself the whole silicon which is needed
> for BM access arbitration and general handling and so on... Very nice tought
> out. But on the software side this is a bit wired, since you can't use
> the generic I/O primitives of the arch in question.

I don't see why all IDE-interfaces in the world have to be I/O-mapped just
because the first PC implementations used that. Sure it was an extended
ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
either.

So the simple abstraction we need to hit IDE-bus registers is a macro or
inline, instead of a call of an I/O-primitive. It was too much work to
abstract this when I inserted the CRIS-arch IDE-driver in the first place
so I found a workaround but now seems like a better time..

Similarily, there is no reason at all why the CPU has to do _polling_ just
because the IDE _bus_ is using a PIO-mode. It probably does that on legacy
PC's but HW designed, hrm, more optimally can use DMA. Hence the hooks for
the ide_func_t.

So I'd figure the software side really would be _easier_ to implement with
those assumptions about how an IDE-interface is supposed to work gone.

> This makes my cleanup of the portability layer a bit hard
> to finish on the software side.

I understand that, so lets keep the discussion going and I'll check over
your current cleanup.

/Bjorn

Benjamin Herrenschmidt

unread,
May 8, 2002, 7:10:06 AM5/8/02
to
>I don't see why all IDE-interfaces in the world have to be I/O-mapped just
>because the first PC implementations used that. Sure it was an extended
>ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
>either.
>
>So the simple abstraction we need to hit IDE-bus registers is a macro or
>inline, instead of a call of an I/O-primitive. It was too much work to
>abstract this when I inserted the CRIS-arch IDE-driver in the first place
>so I found a workaround but now seems like a better time..

No, not a macro. There are cases where you want different access methods
on the same machine. For example, pmacs can have the "mac-io" (ide-pmac)
controller, which is MMIO based, _and_ a PCI-based legacy IDE controller
using inx/outx like IOs. (A typical example is the Blue&White G3 who has
both on the motherboard).

Ultimately, you want the hwif (or what it becomes in 2.5) provide a set
of functions for accessing taskfile registers and doing the PIO data
stream read/writes (that is replace inb/outb and insw/outsw).

Benjamin Herrenschmidt

unread,
May 8, 2002, 7:20:06 AM5/8/02
to
(resent, I had the date screwed up previously, sorry about the
inconvenience).

>I don't see why all IDE-interfaces in the world have to be I/O-mapped just
>because the first PC implementations used that. Sure it was an extended
>ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
>either.
>
>So the simple abstraction we need to hit IDE-bus registers is a macro or
>inline, instead of a call of an I/O-primitive. It was too much work to
>abstract this when I inserted the CRIS-arch IDE-driver in the first place
>so I found a workaround but now seems like a better time..

No, not a macro. There are cases where you want different access methods
on the same machine. For example, pmacs can have the "mac-io" (ide-pmac)
controller, which is MMIO based, _and_ a PCI-based legacy IDE controller
using inx/outx like IOs. (A typical example is the Blue&White G3 who has
both on the motherboard).

Ultimately, you want the hwif (or what it becomes in 2.5) provide a set
of functions for accessing taskfile registers and doing the PIO data
stream read/writes (that is replace inb/outb and insw/outsw).

Martin Dalecki

unread,
May 8, 2002, 7:30:07 AM5/8/02
to
Uz.ytkownik Bjorn Wesen napisa?:

> On Wed, 8 May 2002, Martin Dalecki wrote:
>
>>BTW> I would really love it if the cris architecture people could
>>"lend me" some small developement system for they interresting CPU.
>
>
> We'll consider it :) However,
>
>
>>This unfortunately is the somehow most wired ATA interface
>>around. Which is due to the fact that the interface cell is directly mapped to
>>some CPU registers. As a CPU design I think it's a fine approach. Don't
>>take me wrong. You save yourself the whole silicon which is needed
>>for BM access arbitration and general handling and so on... Very nice tought
>>out. But on the software side this is a bit wired, since you can't use
>>the generic I/O primitives of the arch in question.
>
>
> I don't see why all IDE-interfaces in the world have to be I/O-mapped just
> because the first PC implementations used that. Sure it was an extended
> ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
> either.

Hey I agree and anticipate the design decisions for the Cris CPU
as good and surprisingly refreshing. Like for example the whole
concept of the compacted command set and so on. They are just *cute*...
It's about a year ago I did study the public available documentation
on it.

> So the simple abstraction we need to hit IDE-bus registers is a macro or
> inline, instead of a call of an I/O-primitive. It was too much work to
> abstract this when I inserted the CRIS-arch IDE-driver in the first place
> so I found a workaround but now seems like a better time..

I don't think that it's always the proper aproach for hardware
portability to do it on the "micro operation" level. That's good
for generics like inb outb. In the case of the ATA interface it's
better to do it on the "functional" level above... Just like you did
with ata_read() and ata_write() as they are called now. You can
see I picked it up and when I sort the transport method detecion/setting
out I will apply it to the other friends from the ata_read_xxx family as well.
And then we have the same aproach in the udma_ familiy I just
introduced.

> Similarily, there is no reason at all why the CPU has to do _polling_ just
> because the IDE _bus_ is using a PIO-mode. It probably does that on legacy
> PC's but HW designed, hrm, more optimally can use DMA. Hence the hooks for
> the ide_func_t.

Well right now I think if you look at the IDE 58 patch you will see
that ide_func_t is a 'bit ugly', simple becouse it is introducing
just another entity to the game. We don't need it.
struct ata_chanell *is* the central entitiy for operations from
the host view. In my whole expierence as a programmer it always turned
out to be most sane to make the software design be a homological mapping
of the generalized hardware design on this level of coding.
It's just natural functions are there to serve a specific purpose.

> So I'd figure the software side really would be _easier_ to implement with
> those assumptions about how an IDE-interface is supposed to work gone.
>
>
>>This makes my cleanup of the portability layer a bit hard
>>to finish on the software side.
>
>
> I understand that, so lets keep the discussion going and I'll check over
> your current cleanup.

Well please consider: iff I had access to the hardware it would possibly save
you a lot of reading through bad english ;-).

> /Bjorn

Regards.

Martin Dalecki

unread,
May 8, 2002, 7:30:11 AM5/8/02
to
Uz.ytkownik Benjamin Herrenschmidt napisa?:

> (resent, I had the date screwed up previously, sorry about the
> inconvenience).
>
>
>>I don't see why all IDE-interfaces in the world have to be I/O-mapped just
>>because the first PC implementations used that. Sure it was an extended
>>ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
>>either.
>>
>>So the simple abstraction we need to hit IDE-bus registers is a macro or
>>inline, instead of a call of an I/O-primitive. It was too much work to
>>abstract this when I inserted the CRIS-arch IDE-driver in the first place
>>so I found a workaround but now seems like a better time..
>
>
> No, not a macro. There are cases where you want different access methods
> on the same machine. For example, pmacs can have the "mac-io" (ide-pmac)
> controller, which is MMIO based, _and_ a PCI-based legacy IDE controller
> using inx/outx like IOs. (A typical example is the Blue&White G3 who has
> both on the motherboard).
>
> Ultimately, you want the hwif (or what it becomes in 2.5) provide a set
> of functions for accessing taskfile registers and doing the PIO data
> stream read/writes (that is replace inb/outb and insw/outsw).

Terminology in 2.5:
We have a host chip set or shortly a host chip. This is implementing the
ATA interface on the side of the motherboard.
The host chip is providing two channels. A primary and a secondary
one. To a channel we can attach two devices, however we use the term
drive instead in code becouse the termi device is quite overloaded with
meaning already. The devices are enumerated as units. That's it.
Far more natural then hwif hwgrp and so on. IDE is the Integrated Device
Electronic - the microcontroller stuff I don't care that much about.

Dave Jones

unread,
May 8, 2002, 8:00:09 AM5/8/02
to
On Wed, May 08, 2002 at 04:38:31AM +0100, Anton Altaparmakov wrote:
> >http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz
> Ok, will get that. Someone else emailed me a url and I tried that earlier
> on (ages ago it seems) it said version 0.0.4

I don't think 0.0.5 actually hit the streets, I just named it that as
this one contained something or other I did (can't remember what exactly
diff will tell you) that I was intended to send to the author for 0.0.5



> Certainly it bears no resemblance to what /proc/ide/via
> has to say and it certainly bears no resemblance to
> reality... )-:

Likely it needs an update for the newer VIA chipsets, as this code is
~2 years old. What it does do however, is proove that this doesn't need
to be done in kernel space.

Dave.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

Alan Cox

unread,
May 8, 2002, 8:00:12 AM5/8/02
to
> about it... and you know I'm evil... hmmm...
> well why just don't let it be like that. It's functionally somehow the
> responsibility of the /sbin/hotplug thing anyway...

How do you intend to order a sequence of I/O operations precisely against a
partition table change driven from user space ? Thats one I can't see a nice
answer for, and having a raid controller that can do on the fly volume
resizing/creation/deletion its not just a matter of curiosity

Martin Dalecki

unread,
May 8, 2002, 8:00:15 AM5/8/02
to
Uz.ytkownik Alan Cox napisa?:

>>about it... and you know I'm evil... hmmm...
>>well why just don't let it be like that. It's functionally somehow the
>>responsibility of the /sbin/hotplug thing anyway...
>
>
> How do you intend to order a sequence of I/O operations precisely against a
> partition table change driven from user space ? Thats one I can't see a nice
> answer for, and having a raid controller that can do on the fly volume
> resizing/creation/deletion its not just a matter of curiosity


Nahh Alan we are just talking about the ide-cs stuff. I'm not that "evil".

Alan Cox

unread,
May 8, 2002, 8:10:08 AM5/8/02
to
> RedHat even disables all this chip set specific reporting in theyr
> public kernels. OK kudzu is using it, but it does not *rely on it*.

The boot kernel has a lot of it disabled not the main ones.

> Heck kudzu is running all the time I rebooted my system during
> developement and nothing ugly did happen.

I can't speak directly for the Kudzu maintainer but I can say that having
a sane way to obtain the list of ide devices (all of them not just non
pcmcia) and the device bindings/type has been a long standing request.

If 2.6 breaks a 2.4 installer and nothing else I don't think its a big
disaster and the cleanup may well be justified

Martin Dalecki

unread,
May 8, 2002, 8:20:04 AM5/8/02
to
Uz.ytkownik Alan Cox napisa?:

>>RedHat even disables all this chip set specific reporting in theyr
>>public kernels. OK kudzu is using it, but it does not *rely on it*.
>
>
> The boot kernel has a lot of it disabled not the main ones.
>
>
>>Heck kudzu is running all the time I rebooted my system during
>>developement and nothing ugly did happen.
>
>
> I can't speak directly for the Kudzu maintainer but I can say that having
> a sane way to obtain the list of ide devices (all of them not just non
> pcmcia) and the device bindings/type has been a long standing request.
>
> If 2.6 breaks a 2.4 installer and nothing else I don't think its a big
> disaster and the cleanup may well be justified


Well personally I would just love if there where a "go ahead and don't
care about "compatibility" for the following:

Make hdX gone and use the scsi device major/minor number stuff instead.

And then just making the ATA driver looking like if it where some
incapable SCSI would actually reduce tons of code from kudzu and
friends without the need for any adjustment there.

Martin Dalecki

unread,
May 8, 2002, 8:30:12 AM5/8/02
to
Uz.ytkownik Alan Cox napisa?:

>>Make hdX gone and use the scsi device major/minor number stuff instead.
>>And then just making the ATA driver looking like if it where some
>>incapable SCSI would actually reduce tons of code from kudzu and
>>friends without the need for any adjustment there.
>
>
> The SCSI layer is significant overhead even in 2.5. Right now for example
> it appears to be the primary bottleneck for the aacraid drivers. ATA6 is
> also more capable than SCSI in several areas regardless of the notional
> market positioning.
>
> Linus talked about having a /dev/disc/... which once you have 32bit dev_t
> makes complete sense. What you don't do however is throw IDE through the
> SCSI midlayer, you merely make the /dev/disc/ point call into the right
> drivers - be they raid, scsi or ide. That also lets the scsi emulation
> crap get ripped out of the megaraid and aacraid drivers which will up
> performance.
>
> Alan

Alan... you have taken me wrong. What I mean is just the following.
Take away some minors from use by SCSI (or more propably a common repository)
and use the same ioctl numbers where possible. Perhaps implement
some ioctl here and there... not more!

Not the whole: "we are just another SCSI device on the driver level".
That would not make sense indeed. Since in esp. the SCSI mid-layer isn't
taht pritty too...

Alan Cox

unread,
May 8, 2002, 8:30:10 AM5/8/02
to
> Make hdX gone and use the scsi device major/minor number stuff instead.
> And then just making the ATA driver looking like if it where some
> incapable SCSI would actually reduce tons of code from kudzu and
> friends without the need for any adjustment there.

The SCSI layer is significant overhead even in 2.5. Right now for example


it appears to be the primary bottleneck for the aacraid drivers. ATA6 is
also more capable than SCSI in several areas regardless of the notional
market positioning.

Linus talked about having a /dev/disc/... which once you have 32bit dev_t
makes complete sense. What you don't do however is throw IDE through the
SCSI midlayer, you merely make the /dev/disc/ point call into the right
drivers - be they raid, scsi or ide. That also lets the scsi emulation
crap get ripped out of the megaraid and aacraid drivers which will up
performance.

Alan

-

Denis Vlasenko

unread,
May 8, 2002, 10:00:12 AM5/8/02
to
On 7 May 2002 09:22, Martin Dalecki wrote:
> Mon May 6 13:29:44 CEST 2002 ide-clean-56

+ printk("%s: reset timed-out, status=0x%02x\n", ch->name, stat);

"timed out" (no dash)
--
vda

Richard Gooch

unread,
May 8, 2002, 12:10:06 PM5/8/02
to
Russell King writes:
> On Tue, May 07, 2002 at 04:03:50PM -0600, Richard Gooch wrote:
> > But it's not actually broken, now that the locking is fixed.
>
> Really? What about the case of the missing BKL for device opens that
> you haven't really commented on?

I did comment to you, privately, saying I was waiting to see what the
consensus was on the issue of whether to move the BKL or not. I'll be
sending a patch later this week to fix it.

> Seems like devfs _still_ has locking problems.

A pretty minor one, given the comment I was responding to: "devfs is
unfixable". I've noticed that even Al has gone quiet on the "devfs
races" issue, now that the new code is in place :-)

Regards,

Richard....
Permanent: rgo...@atnf.csiro.au
Current: rgo...@ras.ucalgary.ca

Linus Torvalds

unread,
May 8, 2002, 1:00:18 PM5/8/02
to

On 8 May 2002, Juan Quintela wrote:
> linus> (Side note: I'm afraid that don't think backwards compatibility weighs
> linus> very heavily on an embedded setup - I'm more thinking about things like "a
> linus> regular RedHat/SuSE/Debian/whatever install won't work any more".)
>
> here at Mandrake we have a patch for the install kernel to remove the
> /proc/ide, and I think that we got it from redhat, that means that at
> least two distros preffer to save ~25kb in the boot kernels than the
> reporting that they do :p

Well, that's a good sign in that it implies that things certainyl work
fine without /proc/ide.

However, I think I phrased things badly: I'm not actually worried about
the RedHat or Mandrake "act of installation" itself - since that will
always use whatever kernel RH or Mandrake put on their CD's, and they can
always change their install scripts/programs to match the kernel they use.

I'm more worried about the issue of "I installed RH-x.x, and then I
upgraded the kernel, and now program xyz won't work any more", where "xyz"
is something perfectly reasonable and common.

For example, let's say that some strange version of "mount" _requires_
/proc/ide to work (don't ask me why), and that Mandrake happened to ship
that version in their 8.2 release, and if you use the new 2.5.15 kernel on
that installation, it simply won't work. THAT would be a problem where
some backwards compatibility crud is probably worth it.

But if /proc/ide removal breaks an embedded device (on which the kernel is
not normally upgraded by "normal" people that aren't willing to upgrade
other stuff at the same time), I won't worry too much. Or if the /proc/ide
changs mean that the actual installer has to be re-done, I won't worry.

And even breaking one or two applications might be quite acceptable: I
worry more about maintainability than _perfect_ backwards compatibility.

Linus

Russell King

unread,
May 8, 2002, 1:10:10 PM5/8/02
to
On Wed, May 08, 2002 at 10:07:44AM -0600, Richard Gooch wrote:
> Russell King writes:
> > Really? What about the case of the missing BKL for device opens that
> > you haven't really commented on?
>
> I did comment to you, privately, saying I was waiting to see what the
> consensus was on the issue of whether to move the BKL or not. I'll be
> sending a patch later this week to fix it.

Yes, and hey, we still have the problem a week layer, even after the
discussion went dead.

> > Seems like devfs _still_ has locking problems.
>
> A pretty minor one, given the comment I was responding to: "devfs is
> unfixable". I've noticed that even Al has gone quiet on the "devfs
> races" issue, now that the new code is in place :-)

Never the less, your comment about "no locking problems" is inaccurate.
devfs is calling at least one part of the kernel without obeying the
existing locking rules. That's definitely a devfs bug.

-

Erik Andersen

unread,
May 8, 2002, 2:30:10 PM5/8/02
to
On Wed May 08, 2002 at 01:18:47PM +0100, Alan Cox wrote:
> I can't speak directly for the Kudzu maintainer but I can say that having
> a sane way to obtain the list of ide devices (all of them not just non
> pcmcia) and the device bindings/type has been a long standing request.

Can't one simply do something like:

char device_string[20];
int i, type, major=0, minor=0;
for(i=0; i<26; i++) {
snprintf(device_string, sizeof(device_string), "/dev/hd%c", 'a'+i);
if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
continue;
}
switch ('a'+i) {
case 'a':
major=3;minor=0;
break;
case 'b':
major=3;minor=64;
break;
case 'c':
major=22;minor=0;
break;
case 'd':
major=22;minor=64;
break;
.....
}

etc.... to detect the available ide devices without groveling
through /proc/ide?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

Greg KH

unread,
May 8, 2002, 2:30:11 PM5/8/02
to
On Wed, May 08, 2002 at 09:36:27AM +0200, Martin Dalecki wrote:
> >
> >For e.g. could the same arguments could be made for lspci only
> >interface to pci info rather than /proc/bus/pci? The following
> >references are made to /proc/bus/pci on my system:
>
> In esp. in sigth of the fact that we have a device tree filesystem, I
> rather think that /prco/bus/pci is obsolete indeed.

Not quite yet. I considered moving the functionality of /proc/bus/pci
into driverfs, but couldn't find a good solid reason to do it (and it
would involve changing lspci and any other userspace programs that use
it today.)

Now reimplementing /proc/bus/pci as a stand alone filesystem mounted in
that position (like usbfs is) is another story. pcifs anyone? :)

thanks,

greg k-h

Anton Altaparmakov

unread,
May 8, 2002, 2:40:07 PM5/8/02
to
At 11:25 08/05/02, Martin Dalecki wrote:
>Terminology in 2.5:
>We have a host chip set or shortly a host chip. This is implementing the
>ATA interface on the side of the motherboard.
>The host chip is providing two channels. A primary and a secondary
>one. To a channel we can attach two devices, however we use the term
>drive instead in code becouse the termi device is quite overloaded with
>meaning already. The devices are enumerated as units. That's it.
>Far more natural then hwif hwgrp and so on. IDE is the Integrated Device
>Electronic - the microcontroller stuff I don't care that much about.

</me ignorant>Um, what about the IDE PCI cards which have 4 channels on
them? Like these two:

Adaptec 2400 4Ch IDE Raid Controller
RocketRaid 404 4Ch ATA133 Raid Host Adaptor

Best regards,

Anton

ps Sorry about the outburst yesterday, I was tired and just flipped...


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Andre Hedrick

unread,
May 8, 2002, 3:10:05 PM5/8/02
to
On Wed, 8 May 2002, Anton Altaparmakov wrote:

> </me ignorant>Um, what about the IDE PCI cards which have 4 channels on
> them? Like these two:
>
> Adaptec 2400 4Ch IDE Raid Controller
> RocketRaid 404 4Ch ATA133 Raid Host Adaptor

It is not an issue since they broadcast as single channel pairs per host.
Martin is winning the argument hands down.

Andre Hedrick
LAD Storage Consulting Group

Anton Altaparmakov

unread,
May 8, 2002, 3:10:07 PM5/8/02
to
At 19:55 08/05/02, Andre Hedrick wrote:
>On Wed, 8 May 2002, Anton Altaparmakov wrote:
>
> > </me ignorant>Um, what about the IDE PCI cards which have 4 channels on
> > them? Like these two:
> >
> > Adaptec 2400 4Ch IDE Raid Controller
> > RocketRaid 404 4Ch ATA133 Raid Host Adaptor
>
>It is not an issue since they broadcast as single channel pairs per host.
>Martin is winning the argument hands down.

Thanks, I was just wondering not trying to argument...

Best regards,

Anton

Dave Jones

unread,
May 8, 2002, 3:10:06 PM5/8/02
to
On Wed, May 08, 2002 at 12:21:39PM -0600, Erik Andersen wrote:
> if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
...

> etc.... to detect the available ide devices without groveling
> through /proc/ide?

This goes splat with removable IDE devices like ZIP drives etc.
They fail to open() unless you put a disk in them.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

Linus Torvalds

unread,
May 8, 2002, 3:10:08 PM5/8/02
to

On Wed, 8 May 2002, Martin Dalecki wrote:
>
> I don't think that it's always the proper aproach for hardware
> portability to do it on the "micro operation" level. That's good
> for generics like inb outb. In the case of the ATA interface it's
> better to do it on the "functional" level above...

Amen.

Helleluja.

Listen to the man.

Please don't play games with "ide_outb()" etc, which cause 99% of the
architectures to have to make the 1:1 translation to just "outb()", and
which also makes it incredibly cumbersome to handle multiple _different_
controllers that just happen to use different schemes.

Instead, making the virtualization at a higher point means that you can
have _one_ set of common operations for traditional PCI/ATA controllers
(and that one set uses inx/outx/readx/writex), and then you have a few
others for the "strange" cases.

And done properly with per-controller (or drive - you may want to
virtualize at the drive level just because you could separate out
different kinds of drive accesses that way too) function pointers you can
then _mix_ access methods, without getting completely idiotic run-time
checks inside "ide_out()".

Linus

Andre Hedrick

unread,
May 8, 2002, 3:20:08 PM5/8/02
to

Martin,

You have a practical model that is coming togather nicely (compliments
from the old man), there is a change in the industry to use MMIO based ATA
HBA's. We are currently in a transistion state of affairs, so the problem
Benjamin address that everyone has overlooked is going to bite hard and
soon. There will even be HBA's whose channels will be split between IOMIO
and MMIO, thus being able to select between access calls is urgent.

Cheers,


Andre Hedrick
LAD Storage Consulting Group

On Wed, 8 May 2002, Anton Altaparmakov wrote:

Alan Cox

unread,
May 8, 2002, 3:30:14 PM5/8/02
to
> int i, type, major=0, minor=0;
> for(i=0; i<26; i++) {
> snprintf(device_string, sizeof(device_string), "/dev/hd%c", 'a'+i);
> if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
> continue;
> }

If it opened is it there. Suppose its an IDE floppy and no media is
present. Maybe its hiding in ide-scsi instead. It ends up being detective
work. The /device set up makes it explicit and clean

Benjamin Herrenschmidt

unread,
May 8, 2002, 3:50:06 PM5/8/02
to
>And done properly with per-controller (or drive - you may want to
>virtualize at the drive level just because you could separate out
>different kinds of drive accesses that way too) function pointers you can
>then _mix_ access methods, without getting completely idiotic run-time
>checks inside "ide_out()".

Which ends up basically into having function pointers in the
ata_channel (or ata_drive, but I doubt that would be really
necessary) a set of 4 access functions: taskfile_in/out for
access to taskfile registers (8 bits), and data_in/out for
steaming datas in/out of the data reg (16 bits).

That would cleanly solve my problem of mixing MMIO and PIO
controllers in the same machine, that would solve the crazy
byteswapping needed by some controllers for PIO at least,
etc...

I would even suggest not caring about the taskfile register
address at all (that is kill the array of port addresses) but
just pass the taskfile_in/out functions the register number
(cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
and let the channel specific implementation figure it out.
I haven't checked if you already killed all of the request/release
region crap done by the common ide code, that is matter is completely
internal to the host controller driver, etc...

Now, andre may tell us we need one more set for "slow IO"
versions for some HW, I don't know the details for these so
I'll let the old man speak up here.

Ben.

It is loading more messages.
0 new messages