Rationale for RLIMIT

Matthias Andree

unread,

Jan 23, 2006, 6:00:39 AM1/23/06

to

Greetings,

debugging an application problem that used to mlockall(...FUTURE) and
failed with a subsequent mmap(), I came across the manual page for
setrlimit (see below for the relevant excerpt). I have several questions
concerning the rationale:

1. What is the reason we're having special treatment
for the super-user here?

2. Why is it the opposite of what 2.6.8.1 and earlier did?

3. Why is this inconsistent with all other RLIMIT_*?
Neither of which cares if a process is privileged or not.

4. Is the default hard limit of 32 kB initialized by the kernel or
by some script in SUSE 10.0? If it's the kernel: why is the limit so
low, and why isn't just the soft limit set?

"[...]
RLIMIT_MEMLOCK
The maximum number of bytes of memory that may be locked into
RAM. In effect this limit is rounded down to the nearest multi-
ple of the system page size. This limit affects mlock(2) and
mlockall(2) and the mmap(2) MAP_LOCKED operation. Since Linux
2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it
sets a maximum on the total bytes in shared memory segments (see
shmget(2)) that may be locked by the real user ID of the calling
process. The shmctl(2) SHM_LOCK locks are accounted for sepa-
rately from the per-process memory locks established by
mlock(2), mlockall(2), and mmap(2) MAP_LOCKED; a process can
lock bytes up to this limit in each of these two categories. In
Linux kernels before 2.6.9, this limit controlled the amount of
memory that could be locked by a privileged process. Since
Linux 2.6.9, no limits are placed on the amount of memory that a
privileged process may lock, and this limit instead governs the
amount of memory that an unprivileged process may lock. [...]"
(getrlimit(2), man-pages-2.07)

--
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Arjan van de Ven

unread,

Jan 23, 2006, 6:10:39 AM1/23/06

to

`
>
> 1. What is the reason we're having special treatment
> for the super-user here?

it's quite common to allow root (or more specific, the right capability)
to override rlimits. Many such security check behave that way so it's
only "just" to treat this one like that as well.

> 2. Why is it the opposite of what 2.6.8.1 and earlier did?

the earlier behavior didn't really make sense, and gave cause to
multimedia apps running as root only to be able to mlock etc etc. Now
this can be dynamically controlled instead.

> 4. Is the default hard limit of 32 kB initialized by the kernel or

the kernel has a relatively low default. The reason is simple: allow too
much mlock and the user can DoS the machine too easy. The kernel default
should be safe, the admin / distro can very easily override anyway.

You may ask: why is it not zero?
It is very useful for many things to have a "small" mlock area. gpg, ssh
and basically anything that works with keys and passwords. Small
relative to the other resources such a process takes (eg kernel stacks
etc).

Matthias Andree

unread,

Jan 23, 2006, 12:00:27 PM1/23/06

to

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> `
> >
> > 1. What is the reason we're having special treatment
> > for the super-user here?
>
> it's quite common to allow root (or more specific, the right capability)
> to override rlimits. Many such security check behave that way so it's
> only "just" to treat this one like that as well.

Why is RLIMIT_MEMLOCK special enough to warrant special treatment like
this? The right capability should be able to override with setrlimit(2)
anyways, right?

> > 2. Why is it the opposite of what 2.6.8.1 and earlier did?
>
> the earlier behavior didn't really make sense, and gave cause to
> multimedia apps running as root only to be able to mlock etc etc. Now
> this can be dynamically controlled instead.

Quoting the manpage: "In Linux kernels before 2.6.9, this limit

controlled the amount of memory that could be locked by a privileged
process."

This is nonsense, and it appears as though 2.6.8 and earlier didn't
apply the limit to unprivileged processes. Should the behavior stay as
inconsistent as it's now, I'd suggest to reword this to "...before

2.6.9, this limit controlled the amount of memory that could be locked

by /any/ process." or something even better if someone can think of
such. (manpages maintainer Cc'd)

> > 4. Is the default hard limit of 32 kB initialized by the kernel or
>
> the kernel has a relatively low default. The reason is simple: allow too
> much mlock and the user can DoS the machine too easy. The kernel default
> should be safe, the admin / distro can very easily override anyway.

This doesn't appear to happen for SUSE 10.0, which causes trouble with
some of the "multimedia apps" BTW... apparently the limit was lowered at
the same time as the root restrictions were relaxed.

Such changes in behavior aren't adequate for 2.6.X, there are way too
many applications that can't be bothered to check the patchlevel of the
kernel, and it's totally unintuitive to users, too. Aside from the fact
that most distros have settled on one kernel.

> You may ask: why is it not zero?

No, I'm not doing that. I rather wonder why it's so low, or whom a certain
percentage such as RAM >> 5 (that's 3.125 %) would hurt. Allowing
unlimited memory allocation while at the same time allowing only 32 kB
of mlock()ed memory seems disproportionate to me.

--
Matthias Andree

Arjan van de Ven

unread,

Jan 23, 2006, 12:10:49 PM1/23/06

to

> > > 4. Is the default hard limit of 32 kB initialized by the kernel or
> >
> > the kernel has a relatively low default. The reason is simple: allow too
> > much mlock and the user can DoS the machine too easy. The kernel default
> > should be safe, the admin / distro can very easily override anyway.
>
> This doesn't appear to happen for SUSE 10.0, which causes trouble with
> some of the "multimedia apps" BTW... apparently the limit was lowered at
> the same time as the root restrictions were relaxed.

yes the behavior is like this

root non-root
before about half of ram nothing
after all of ram by default small, increasable

> Such changes in behavior aren't adequate for 2.6.X, there are way too
> many applications that can't be bothered to check the patchlevel of the
> kernel, and it's totally unintuitive to users, too.

there is NO fundamental change here other than a *general* relaxing.
This is important to note: Apps that could mlock before STILL can mlock.
Only apps that would depend on mlock failing with a security check, and
only those who do small portions, break now because suddenly the mlock
succeeds. Big deal... those would have broken when run as root already

> No, I'm not doing that. I rather wonder why it's so low, or whom a certain
> percentage such as RAM >> 5 (that's 3.125 %) would hurt. A

because it's generally a PER PROCESS limit, so fork 60 times and kaboom
things explode. (You can argue you can forkbomb anyway, but that's
where the process count rlimit comes in)

> Allowing
> unlimited memory allocation while at the same time allowing only 32 kB
> of mlock()ed memory seems disproportionate to me.

it's not. Normal memory is swapable. And thus a far less rare commodity
than precious pinned down memory.

What application do you have in mind that broke by this relaxing of
rules?

Matthias Andree

unread,

Jan 23, 2006, 1:11:29 PM1/23/06

to

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> yes the behavior is like this
>
> root non-root
> before about half of ram nothing
> after all of ram by default small, increasable

> [...]

> What application do you have in mind that broke by this relaxing of
> rules?

This is not something I'd like to disclose here yet.

It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and
apparently copes with mlockall() returning EPERM (or doesn't even try
it) but can apparently NOT cope with valign() tripping over mmap() ==
-1/EAGAIN.

The relevant people are Bcc:d.

--
Matthias Andree

Arjan van de Ven

unread,

Jan 23, 2006, 1:21:04 PM1/23/06

to

On Mon, 2006-01-23 at 19:01 +0100, Matthias Andree wrote:
> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
>
> > yes the behavior is like this
> >
> > root non-root
> > before about half of ram nothing
> > after all of ram by default small, increasable
> > [...]
> > What application do you have in mind that broke by this relaxing of
> > rules?
>
> This is not something I'd like to disclose here yet.
>
> It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and
> apparently copes with mlockall() returning EPERM

hmm... curious that mlockall() succeeds with only a 32kb rlimit....

Matthias Andree

unread,

Jan 23, 2006, 2:00:54 PM1/23/06

to

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> hmm... curious that mlockall() succeeds with only a 32kb rlimit....

It's quite obvious with the seteuid() shuffling behind the scenes of the
app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.

Clearly the application should do both with the same privilege or raise
the RLIMIT_MEMLOCK while running with privileges.

The question that's open is one for the libc guys: malloc(), valloc()
and others seem to use mmap() on some occasions (for some allocation
sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
then drops privileges.

The function in question appears to be valloc() with glibc 2.3.5.

In this light, mlockall(MCL_FUTURE) is pretty useless, since there is no
way to undo MCL_FUTURE without unlocking all pages at the same time.
Particularly so for setuid apps...

I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps use
explicit mlock() instead.

--
Matthias Andree

Arjan van de Ven

unread,

Jan 23, 2006, 2:10:24 PM1/23/06

to

On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote:
> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
>
> > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
>
> It's quite obvious with the seteuid() shuffling behind the scenes of the
> app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.

hmm how on earth was that supposed to work at all????

Joerg Schilling

unread,

Jan 23, 2006, 2:40:24 PM1/23/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
>
> > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
>
> It's quite obvious with the seteuid() shuffling behind the scenes of the
> app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
>
> Clearly the application should do both with the same privilege or raise
> the RLIMIT_MEMLOCK while running with privileges.
>
> The question that's open is one for the libc guys: malloc(), valloc()
> and others seem to use mmap() on some occasions (for some allocation
> sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> then drops privileges.

If the behavior described by Matthias is true for current Linuc kernels,
then there is a clean bug that needs fixing.

If the Linux kernel is not willing to accept the contract by
mlockall(MLC_FUTURE), then it should now accept the call at all.

In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later
ignores this contract. This bug should be fixed.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Lee Revell

unread,

Jan 23, 2006, 3:00:11 PM1/23/06

to

On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote:

> I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps
> use explicit mlock() instead.

Probably good advice, I have found mlockall() to be especially
problematic with multithreaded programs and NPTL, as glibc eats
RLIMIT_STACK of unswappable memory for each thread stack which defaults
to 8MB here - you go OOM really quick like this. Most people don't seem
to realize the need to set a sane value with pthread_attr_setstack().

(Even when not mlock'ed, insanely huge thread stack defaults seem to
account for a lot of the visible bloat on the desktop - decreasing
RLIMIT_STACK to 512KB reduces the footprint of Gnome 2.12 by 100+ MB.)

Lee

Matthias Andree

unread,

Jan 23, 2006, 3:40:10 PM1/23/06

to

Joerg Schilling schrieb am 2006-01-23:

> Matthias Andree <matthia...@gmx.de> wrote:
>
> > On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> >
> > > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
> >
> > It's quite obvious with the seteuid() shuffling behind the scenes of the
> > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
> >
> > Clearly the application should do both with the same privilege or raise
> > the RLIMIT_MEMLOCK while running with privileges.
> >
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
>
> If the behavior described by Matthias is true for current Linuc kernels,
> then there is a clean bug that needs fixing.

Jörg elided my lines that said valloc() was the function in question.

Jörg, if we're talking about valloc(), this hasn't much to do with the
kernel, but is a library issue.

There is _no_ documentation that says valloc() or memalign() or
posix_memalign() is required to use mmap(). It works on some systems and
for some allocation sizes as a side effect of the valloc()
implementation.

And because this requirement is not specified in the relevant standards,
it is wrong to assume valloc() returns locked pages. You cannot rely on
mmap() returning locked pages after mlockall() either, because you might
be exceeding resource limits.

> If the Linux kernel is not willing to accept the contract by
> mlockall(MLC_FUTURE), then it should now accept the call at all.

If the application wants locked pages, it either needs to call mmap()
explicitly, or use mlock() on the valloc()ed region. Even then,
allocation or mlock may fail due to resource constraints. I checked
FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
this.

> In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later
> ignores this contract. This bug should be fixed.

The complete story is, condensed, and with return values, for a
setuid-root application:

geteuid() == 0;
mlockall(MLC_CURRENT|MLC_FUTURE) == (success);
seteuid(500) == (success);
valloc(64512 + pagesize) == NULL (failure);

Jörg, correct me if the valloc() figure is wrong.

valloc() called mmap() internally, tried to grab 1 MB, and failed with
EAGAIN - as we were able to see from the strace.

SuSE Linux 10.0, kernel 2.6.13-15.7-default #1 Tue Nov 29 14:32:29 UTC 2005
on i686 athlon i386 GNU/Linux

--
Matthias Andree

Lee Revell

unread,

Jan 23, 2006, 3:40:20 PM1/23/06

to

On Mon, 2006-01-23 at 20:38 +0100, Joerg Schilling wrote:
> Matthias Andree <matthia...@gmx.de> wrote:
>
> > On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> >
> > > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
> >
> > It's quite obvious with the seteuid() shuffling behind the scenes of the
> > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
> >
> > Clearly the application should do both with the same privilege or raise
> > the RLIMIT_MEMLOCK while running with privileges.
> >
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
>
> If the behavior described by Matthias is true for current Linuc kernels,
> then there is a clean bug that needs fixing.
>
> If the Linux kernel is not willing to accept the contract by
> mlockall(MLC_FUTURE), then it should now accept the call at all.
>
> In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later
> ignores this contract. This bug should be fixed.

Joerg,

You will be happy to know that in future Linux distros, cdrecord will
not require setuid to mlock() and get SCHED_FIFO - both are now
controlled by rlimits, so if the distro ships with a sane PAM/group
configuration, all you will need to do is add cdrecord users to the
"realtime" or "cdrecord" or "audio" group.

This will take a while to make it into distros as it requires changes to
PAM and glibc in addition to the kernel.

Lee

Matthias Andree

unread,

Jan 23, 2006, 4:30:08 PM1/23/06

to

On Mon, 23 Jan 2006, Lee Revell wrote:

> You will be happy to know that in future Linux distros, cdrecord will
> not require setuid to mlock() and get SCHED_FIFO - both are now
> controlled by rlimits, so if the distro ships with a sane PAM/group
> configuration, all you will need to do is add cdrecord users to the
> "realtime" or "cdrecord" or "audio" group.

Sounds really good. Can you give a pointer as to the detailed rlimit
requirements?

Anyways, this seems like a very good point in time to pick up the old
discussion of ide-scsi, ide-cd and thereabouts, because your
announcement met Jörg's criterion that Linux had to make a step forward
in his direction before he'd try to negotiate again.

I'm more of a user who is annoyed by this war of warning messages
(ide-scsi claiming it's unsuitable for CD writing, cdrecord calling
/dev/hd* badly designed, and all that), and I'd appreciate if people
could just

1. compile a list of their requirements,

2. find out the current state of affairs,

3. match the lists found in 1 and 2

4. ONLY AFTER THAT negotiate who is going to change what to make things
work better for us end users.

Of course, I think it's sensible to expect that Linux should adhere to
standards (POSIX) as far as possible, and if any precedent
implementations that are standards-conformant are found, I'd suggest
that Linux adheres to their interpretation, too, to reduce the clutter
and make applications more easily ported to Linux. We'll all benefit.

LIST 1 # REQUIREMENTS

R1 I'll just say we all want cdrecord, dvd recording applications and
similar to work without setuid root flags or sudo or other excessive
privilege escalation. (This needs to be split up into I/O access
privileges, device enumeration, buffer allocation, real-time
requirements such as locking buffers into memory, scheduling and so on.)

LIST 2 # CURRENT STATE

S1 Jörg is unhappy with /dev/hd* because he says that it is inferior to
the sg-access via ide-scsi. (I believe the original issues were
DMA-based, and I don't know the details.) I hope Jörg will fill in the
operations that ide-cd (/dev/hd*) lacks. (Jörg, please don't talk about
layer violations here).

S2 Jörg is concerned about the SCSI command filter being too
restrictive. I'm not sure if it still applies to 2.6.16-rc and what the
exact commands in question were. I'll let Jörg complete this list.

S3 Device enumeration/probing is a sore spot. Unprivileged "cdrecord
dev=ATA: -scandisk" doesn't work, and recent discussions on the cdwrite@
list didn't make any progress. My observation is that cdrecord stops
probing /dev/hd* devices as soon as one yields EPERM, on the assumption
"if I cannot access /dev/hda, I will not have sufficient privilege to
write a CD anyways". I find this wrong, Jörg finds it correct and argues
"if you can access /dev/hdc as unprivileged user, that's a security
problem".

These topics I brought up are my recollections from memory, without
archive research, that I deem worth developing into either requirements
or "state-of-the-art" assertions of the "we're already there" kind.

Please, everybody, ONLY list what you would like to do, why, why it
doesn't work. Please DO NOT TELL THE OTHER SIDE HOW they are supposed to
do it, unless it's worded as a polite and patient question. We've been
there, and it didn't work.

I hope this is getting a more fruitful discussion than last time.

--
Matthias Andree

Lee Revell

unread,

Jan 23, 2006, 4:30:08 PM1/23/06

to

On Mon, 2006-01-23 at 22:21 +0100, Matthias Andree wrote:
> Sounds really good. Can you give a pointer as to the detailed rlimit
> requirements?

I don't want to touch the rest of the thread, but the best info on the
above can be found in the linux-audio-user list archives. It's still a
little unclear exactly which packages are required, but IIRC PAM 0.80
supports it already. I believe this requires glibc changes eventually,
but programs like PAM and bash that deal with rlimits can work around it
if glibc is not aware of the new rlimit.

Personally I still use the old realtime LSM, until this is all worked
out.

Lee

Joerg Schilling

unread,

Jan 23, 2006, 4:30:10 PM1/23/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> > If the behavior described by Matthias is true for current Linuc kernels,
> > then there is a clean bug that needs fixing.
>
> Jörg elided my lines that said valloc() was the function in question.
>
> Jörg, if we're talking about valloc(), this hasn't much to do with the
> kernel, but is a library issue.

From my understanding, the problem is that Linux first grants the
mlockall(MLC_FUTURE) call and later ignores this contract.

The fact that valloc() works in a way that is not comprehensible
seems to be another issue. Libscg calls valloc(size) where size is less than
64 KB. From the strace output from Matthias, it looks like valloc first calls
brk() to extend the size of the data segment (probably to aproach the next
pagesize aligned border) and later calls mmap() to get 1 MB or memory.
Well first it seems that valloc() tries to get too much memory but this
is another story.

Inside the kernel handler for this call, the permission to lock the new
memory _again_ checks for permission and this is wrong as the request
for locking all future pages of the process already has been granted.

This looks similar to when I open() a file that may only be opened as root
and late switch my uid to some other id. If read() would be implemented the
same way as Linux implements the locking, each read() call would again check
whether the current uid would have permission to get access to the fd from a
filename. This is obviously wrong. The _process_ has been granted the rights
to mlock all future pages and this is something that needs to be nonored until
the process dies.

> There is _no_ documentation that says valloc() or memalign() or
> posix_memalign() is required to use mmap(). It works on some systems and
> for some allocation sizes as a side effect of the valloc()
> implementation.

The problem seems to be independend how valloc() is implemented.

> And because this requirement is not specified in the relevant standards,
> it is wrong to assume valloc() returns locked pages. You cannot rely on
> mmap() returning locked pages after mlockall() either, because you might
> be exceeding resource limits.

If there were such resource limits, then they would need to be honored
regardless of the privileges of the process.

> > If the Linux kernel is not willing to accept the contract by
> > mlockall(MLC_FUTURE), then it should now accept the call at all.
>
> If the application wants locked pages, it either needs to call mmap()
> explicitly, or use mlock() on the valloc()ed region. Even then,
> allocation or mlock may fail due to resource constraints. I checked
> FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
> this.

What did you check?

Solaris does not check for any privileges whan calling mmap()

Solaris implements mlockall() via memcntl which contains the only
place where a check for secpolicy_lock_memory(CRED()) takes place.

> > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later
> > ignores this contract. This bug should be fixed.
>
> The complete story is, condensed, and with return values, for a
> setuid-root application:
>
> geteuid() == 0;
> mlockall(MLC_CURRENT|MLC_FUTURE) == (success);
> seteuid(500) == (success);
> valloc(64512 + pagesize) == NULL (failure);
>
> Jörg, correct me if the valloc() figure is wrong.
>
> valloc() called mmap() internally, tried to grab 1 MB, and failed with
> EAGAIN - as we were able to see from the strace.

This is correct.

Returning EAGAIN seems to be a result of missunderstanding the POSIX
standard. The POSIX standard means real hardware resources when talking about

EAGAIN]
[ML] The mapping could not be locked in memory, if required by
mlockall(), due to a lack of resources.

If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to
honor this resource independent from the user id in order to prevent being
contradictory.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Lee Revell

unread,

Jan 23, 2006, 4:40:13 PM1/23/06

to

On Mon, 2006-01-23 at 22:21 +0100, Matthias Andree wrote:

> On Mon, 23 Jan 2006, Lee Revell wrote:
>
> > You will be happy to know that in future Linux distros, cdrecord will
> > not require setuid to mlock() and get SCHED_FIFO - both are now
> > controlled by rlimits, so if the distro ships with a sane PAM/group
> > configuration, all you will need to do is add cdrecord users to the
> > "realtime" or "cdrecord" or "audio" group.
>
> Sounds really good. Can you give a pointer as to the detailed rlimit
> requirements?

One thing I believe is still unresolved is that despite the new rlimits,
sched_get_priority_max(SCHED_FIFO) always returns 99 rather than
RLIMIT_RTPRIO.

Lee

Joerg Schilling

unread,

Jan 23, 2006, 4:40:16 PM1/23/06

to

Lee Revell <rlre...@joe-job.com> wrote:

> > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later
> > ignores this contract. This bug should be fixed.
>
> Joerg,
>
> You will be happy to know that in future Linux distros, cdrecord will
> not require setuid to mlock() and get SCHED_FIFO - both are now
> controlled by rlimits, so if the distro ships with a sane PAM/group
> configuration, all you will need to do is add cdrecord users to the
> "realtime" or "cdrecord" or "audio" group.
>
> This will take a while to make it into distros as it requires changes to
> PAM and glibc in addition to the kernel.

Well, on Solaris running cdrecord root-less is possible since 2 years.

What you do is to add a line

joerg::::profiles=CD RW

to /etc/user_attr

and a line:

CD RW:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr

to /etc/security/exec_attr

or to just a line

All:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr

to /etc/security/exec_attr

the command then is executed via /usr/vin/pfexec and gets the listed fine
grained privileges in addition to the basic privileges.

We plan to break sys_devices into more fine grained privs that
include several levels of SCSI rights in the near future.

If Linux manages to do something similar, I would be happy.
It is obvious that this is someting that could only be used if there
is not only kernel code to support fine grained privs but there is a need
for a user space infrastructure that allows to use a seamless integration.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 23, 2006, 4:50:14 PM1/23/06

to

Lee Revell <rlre...@joe-job.com> wrote:

> On Mon, 2006-01-23 at 22:21 +0100, Matthias Andree wrote:
> > Sounds really good. Can you give a pointer as to the detailed rlimit
> > requirements?
>
> I don't want to touch the rest of the thread, but the best info on the
> above can be found in the linux-audio-user list archives. It's still a
> little unclear exactly which packages are required, but IIRC PAM 0.80
> supports it already. I believe this requires glibc changes eventually,
> but programs like PAM and bash that deal with rlimits can work around it
> if glibc is not aware of the new rlimit.

Could you explain this more in depth?

What you describe looks like you propose to add a line:

joerg::::defaultpriv=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr

to /etc/user_attr which would be honored by PAM during login.

This is not what I like to see.

What I like to see is that only specific programs like cdrecord
would get the permissions to do more than joe user.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 23, 2006, 5:10:07 PM1/23/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> S2 Jörg is concerned about the SCSI command filter being too
> restrictive. I'm not sure if it still applies to 2.6.16-rc and what the
> exact commands in question were. I'll let Jörg complete this list.

I am tired today and I need to do other work, so let me parly reply:

Iff there is a user space infrastructure for fine grained privileges,
there is absolutely no problem with a planned and well known restriction.

On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh)
that calls getexecuser() in order to find whether there is a specific treatment
needed. If this specific treatment is needed, then the shell calls
execve(/usr/bin/pfexec cmd <args>)
else it calls execve(cmd <args>)

I did recently voted to require all shells to be profile enabled by default.

With the future plans for extending fine grained privs on Solaris, sending
SCSI commands will become more than one priv.

I proposed to have a low priv right to send commands like inquiry and test unit
ready. These commands may e.g. be send without interfering a concurrent CD/DVD
write operation.

The next priv could be the permission for sending simple SCSI commands that
allow reading from the device.

The next priv could be the permission for sending simple SCSI Commands that
allow writing.

The final priv would allow even vendor specific commands: this is what cdrecord
needs.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Lee Revell

unread,

Jan 23, 2006, 5:10:10 PM1/23/06

to

On Mon, 2006-01-23 at 22:45 +0100, Joerg Schilling wrote:
> Lee Revell <rlre...@joe-job.com> wrote:
>
> > On Mon, 2006-01-23 at 22:21 +0100, Matthias Andree wrote:
> > > Sounds really good. Can you give a pointer as to the detailed rlimit
> > > requirements?
> >
> > I don't want to touch the rest of the thread, but the best info on the
> > above can be found in the linux-audio-user list archives. It's still a
> > little unclear exactly which packages are required, but IIRC PAM 0.80
> > supports it already. I believe this requires glibc changes eventually,
> > but programs like PAM and bash that deal with rlimits can work around it
> > if glibc is not aware of the new rlimit.
>
> Could you explain this more in depth?
>
> What you describe looks like you propose to add a line:
>
> joerg::::defaultpriv=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr
>
> to /etc/user_attr which would be honored by PAM during login.
>
> This is not what I like to see.
>
> What I like to see is that only specific programs like cdrecord
> would get the permissions to do more than joe user.

It's not that fine grained, it works at a user/group level.

You would add a line like:

@cdrecord hard rtprio 80

to /etc/security/limits.conf and add users to the cdrecord group.

Lee

Matthias Andree

unread,

Jan 23, 2006, 5:10:12 PM1/23/06

to

Joerg Schilling schrieb am 2006-01-23:

> Matthias Andree <matthia...@gmx.de> wrote:
>
> > > If the behavior described by Matthias is true for current Linuc kernels,
> > > then there is a clean bug that needs fixing.
> >
> > Jörg elided my lines that said valloc() was the function in question.
> >
> > Jörg, if we're talking about valloc(), this hasn't much to do with the
> > kernel, but is a library issue.
>
> From my understanding, the problem is that Linux first grants the
> mlockall(MLC_FUTURE) call and later ignores this contract.

...

> Inside the kernel handler for this call, the permission to lock the new
> memory _again_ checks for permission and this is wrong as the request
> for locking all future pages of the process already has been granted.

I *do* think that the kernel refused our mmap() request on grounds of
the RLIMIT_MEMLOCK (32 kB) and not any other reason, because running the
same allocation code as root succeeds, and Linux 2.6.13 is documented to
ignore RLIMIT_MEMLOCK for the super-user.

And I do believe Linux is entirely on IEEE Std 1003.1-2001 grounds here.

> > There is _no_ documentation that says valloc() or memalign() or
> > posix_memalign() is required to use mmap(). It works on some systems and
> > for some allocation sizes as a side effect of the valloc()
> > implementation.
>
> The problem seems to be independend how valloc() is implemented.

As far as the kernel is concerned, yes.

As far as your application is concerned, valloc() does not provide
"mapped" or "locked" pages, but "allocated".

> > And because this requirement is not specified in the relevant standards,
> > it is wrong to assume valloc() returns locked pages. You cannot rely on
> > mmap() returning locked pages after mlockall() either, because you might
> > be exceeding resource limits.
>
> If there were such resource limits, then they would need to be honored
> regardless of the privileges of the process.

That's a different story.

> > > If the Linux kernel is not willing to accept the contract by
> > > mlockall(MLC_FUTURE), then it should now accept the call at all.
> >
> > If the application wants locked pages, it either needs to call mmap()
> > explicitly, or use mlock() on the valloc()ed region. Even then,
> > allocation or mlock may fail due to resource constraints. I checked
> > FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
> > this.
>
> What did you check?

The mlockall() documentation. Any OS allows later mappings to fail if
they cannot be locked, and this is what happens.

The only troublesome spot that remains is valloc() using mmap()
internally, which inherits the mlockall()/mmap() failure modes and
causes bogus "out of memory" returns by valloc().

1. valloc is not required to lock pages
2. yet it can fail if it cannot lock pages

This is a problem from the applications POV, albeit one that is in
glibc's memory allocator.

mlockall() does NOT make promises HOW MUCH memory may be allocated in
the future, and that is the problem at hand. Linux allows us 32 kB (as
unprivileged user even, we don't get that with Solaris or FreeBSD!), but
we want 63 kB and Linux says "Sorry, you can't have that. EAGAIN"

> Returning EAGAIN seems to be a result of missunderstanding the POSIX
> standard. The POSIX standard means real hardware resources when talking about

Well... mlockall() allows for, "other implementation-defined limit[s]",
so POSIX is not supportive of your argument here.

> EAGAIN]
> [ML] The mapping could not be locked in memory, if required by
> mlockall(), due to a lack of resources.
>
> If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to
> honor this resource independent from the user id in order to prevent being
> contradictory.

This is irrelevant to cdrecord, because it does not trip over this
contradiction.

If I were the cdrecord maintainer, I'd forget about mlockall()
altogether because it's just too broad and doesn't allow something like
"no more auto locking" without unlocking all locked pages (see also Lee
Revell's earlier post), lock the FIFO, command data buffers and
everything explicitly through mlock(), set the scheduler, open the
device and then call setuid() to get rid of the saved set-user-id as
well. This may be narrow-minded, but given mlock() is present in the BSD
world (FreeBSD, NetBSD), in the SysV world (Solaris) and Linux, there's
reason to support it, as these constitute a large user base.

If anything then still fails (command filter), I'd ask the kernel guys
how the restriction can be lifted so that cdrecord can work without ANY
root privileges, in the most portable way.

--
Matthias Andree

Theodore Ts'o

unread,

Jan 23, 2006, 8:10:14 PM1/23/06

to

On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote:
> The question that's open is one for the libc guys: malloc(), valloc()
> and others seem to use mmap() on some occasions (for some allocation
> sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> then drops privileges.

Maybe mlockall(MLC_FUTURE) when run with privileges should
automatically adjust the RLIMIT_MEMLOCK resource limit?

- Ted

Arjan van de Ven

unread,

Jan 24, 2006, 4:00:21 AM1/24/06

to

> c() was the function in question.
>
> Jörg, if we're talking about valloc(), this hasn't much to do with the
> kernel, but is a library issue.
>
> There is _no_ documentation that says valloc() or memalign() or
> posix_memalign() is required to use mmap(). It works on some systems and
> for some allocation sizes as a side effect of the valloc()
> implementation.

it doesn't matter. Regardless of the method, the memory has to be locked
due to the FUTURE requirement.

> And because this requirement is not specified in the relevant standards,
> it is wrong to assume valloc() returns locked pages.

is it? I sort of doubt that (but I'm not a standards expert, but I'd
expect that "lock all in the future" applies to all memory, not just
mmap'd memory

> You cannot rely on
> mmap() returning locked pages after mlockall() either, because you might
> be exceeding resource limits.

this is true and fully correct

the situation is messy; I can see some value in the hack Ted proposed to
just bump the rlimit automatically at an mlockall-done-by-root.. but to
be fair it's a hack :(

Arjan van de Ven

unread,

Jan 24, 2006, 4:20:06 AM1/24/06

to

On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > the situation is messy; I can see some value in the hack Ted proposed to
> > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > be fair it's a hack :(
>

> As all other rlimits are honored even if you are root, it looks not orthogonal
> to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.

that's another solution; give root a higher rlimit by default for this.
It's also a bit messy, but a not-unreasonable default behavior.

Joerg Schilling

unread,

Jan 24, 2006, 4:20:09 AM1/24/06

to

Arjan van de Ven <ar...@infradead.org> wrote:

> > And because this requirement is not specified in the relevant standards,
> > it is wrong to assume valloc() returns locked pages.
>
> is it? I sort of doubt that (but I'm not a standards expert, but I'd
> expect that "lock all in the future" applies to all memory, not just
> mmap'd memory

I concur:

Locking pages into core is a property/duty of the VM subsystem.
If you have an orthogonal VM subsystem, you cannot later tell how a page was
mapped into the user's address space. Even more: you may map a file to a
alocation in the data segment of the proces (that has been retrieved via
malloc()/brk()) and replace the related mapping with a mapped file.

On Solaris, there is no difference.

>
> > You cannot rely on
> > mmap() returning locked pages after mlockall() either, because you might
> > be exceeding resource limits.
>
> this is true and fully correct
>
>
>
> the situation is messy; I can see some value in the hack Ted proposed to
> just bump the rlimit automatically at an mlockall-done-by-root.. but to
> be fair it's a hack :(

As all other rlimits are honored even if you are root, it looks not orthogonal

to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 24, 2006, 4:30:07 AM1/24/06

to

Arjan van de Ven <ar...@infradead.org> wrote:

> On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > > the situation is messy; I can see some value in the hack Ted proposed to
> > > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > > be fair it's a hack :(
> >
> > As all other rlimits are honored even if you are root, it looks not orthogonal
> > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.
>
> that's another solution; give root a higher rlimit by default for this.
> It's also a bit messy, but a not-unreasonable default behavior.

This would only make sense in case that you bump up the limit for processes
that are suid root and do not lower it in case someone calls seteuid().

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Matthias Andree

unread,

Jan 24, 2006, 6:01:03 AM1/24/06

to

Joerg Schilling schrieb am 2006-01-24:

> Arjan van de Ven <ar...@infradead.org> wrote:
>
> > > And because this requirement is not specified in the relevant standards,
> > > it is wrong to assume valloc() returns locked pages.
> >
> > is it? I sort of doubt that (but I'm not a standards expert, but I'd
> > expect that "lock all in the future" applies to all memory, not just
> > mmap'd memory
>
> I concur:
>
> Locking pages into core is a property/duty of the VM subsystem.

But where is this laid down in the standard? There must be some part
that defines this, else we cannot rely on it. The wording for malloc()
and mmap() or mlock() is different. One talks about address space and
mapping, whereas malloc() talks about "storage".

Only I haven't got time to look for it now. Just that Solaris happens to
do it doesn't make it a standard.

--
Matthias Andree

Matthias Andree

unread,

Jan 24, 2006, 6:10:16 AM1/24/06

to

On Mon, 23 Jan 2006, Theodore Ts'o wrote:

> On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote:
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
>
> Maybe mlockall(MLC_FUTURE) when run with privileges should
> automatically adjust the RLIMIT_MEMLOCK resource limit?

Adding special cases to no end.
Is this really sensible?

How about leaving RLIMIT_MEMLOCK alone (and at RLIM_INFINITY) for root
processes altogether? At least that wouldn't add a new special case but
just change the existing one to remove an inconsistency, and the effect
will be the same, only that it is inherited across seteuid().

I doubt that the kernel is the right place to implement policies that
belong into user space. As long as the kernel is meant to be universal,
any default will collide with an application's requirement sooner or
later.

--
Matthias Andree

Joerg Schilling

unread,

Jan 24, 2006, 9:00:22 AM1/24/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> Of course, I think it's sensible to expect that Linux should adhere to
> standards (POSIX) as far as possible, and if any precedent
> implementations that are standards-conformant are found, I'd suggest
> that Linux adheres to their interpretation, too, to reduce the clutter
> and make applications more easily ported to Linux. We'll all benefit.

With respect to SCSI transport, it would also make sense tolook at the
implementations of various other platforms.

> LIST 1 # REQUIREMENTS
>
> R1 I'll just say we all want cdrecord, dvd recording applications and
> similar to work without setuid root flags or sudo or other excessive
> privilege escalation. (This needs to be split up into I/O access
> privileges, device enumeration, buffer allocation, real-time
> requirements such as locking buffers into memory, scheduling and so on.)

With fine grained privileges and a nice inherent user level framework, this
kind of problems should not apear inside cdrecord at all.

> LIST 2 # CURRENT STATE
>
> S1 Jörg is unhappy with /dev/hd* because he says that it is inferior to
> the sg-access via ide-scsi. (I believe the original issues were
> DMA-based, and I don't know the details.) I hope Jörg will fill in the
> operations that ide-cd (/dev/hd*) lacks. (Jörg, please don't talk about
> layer violations here).

One original issue was that ide-scsi did cause a kernel panic in case it
was used on top of PCMCIA based ATA.

The other issue is that ide-scsi does not do DMA in case DMA-size is not
a multiple of 512 while is is needed for any size % 4 == 0;
or at least size % 8 == 0

> S2 Jörg is concerned about the SCSI command filter being too
> restrictive. I'm not sure if it still applies to 2.6.16-rc and what the
> exact commands in question were. I'll let Jörg complete this list.

If this change had been announced early anough and if there was a workaround,
there would be no problem. The problem was that someone has a bad dream and
incompatibly changed the Linux kernel over night while cdrecord was in code
freeze. Later I was called unflexible because I did follow the well known
quality ensuring rules that are in effect short before a new stable/final
released is published.

> S3 Device enumeration/probing is a sore spot. Unprivileged "cdrecord
> dev=ATA: -scandisk" doesn't work, and recent discussions on the cdwrite@
> list didn't make any progress. My observation is that cdrecord stops
> probing /dev/hd* devices as soon as one yields EPERM, on the assumption
> "if I cannot access /dev/hda, I will not have sufficient privilege to
> write a CD anyways". I find this wrong, Jörg finds it correct and argues
> "if you can access /dev/hdc as unprivileged user, that's a security
> problem".

This are two problems:

- users of cdrecord like to run cdrecord -scanbus in order to find all
SCSI devices. This no longer works since the non-orthogonal /dev/hd*
SCSI transport has been added.

As Linux already implements a Generic SCSI transport interface
(/dev/sg*) people would asume to be able to talk to _all_ SCSI devices
using this interface. To allows this, there is a need for a
SCSI HBA driver that sends SCSI commands via a ATA interface.

- some people seem to set the permissions of some of the /dev/hd*
nodes to unsafe values and then complain that the other /dev/hd*
nodes cannot be opened.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jan Engelhardt

unread,

Jan 24, 2006, 12:50:40 PM1/24/06

to

I'm joining in,

>
>1. compile a list of their requirements,

Have as few code duplicated (e.g. ATAPI and SCSI may share some - after
all, ATAPI is (to me) some sort of SCSI tunneled in ATA.)

Make it, in accordance with the above, possible to have as few kernel
modules loaded as possible and therefore reducing footprint - if I had not
to load sd_mod for usb_storage fun, I would get an itch to load a 78564
byte scsi_mod module just to be able to use ATAPI. (MINOR one, though.)

Want to write CDs and DVDs "as usual" (see below).

De-forest the SCSI subsystem for privilege checking (see below).

>2. find out the current state of affairs,

I am currently able to properly write all sorts of CD-R/RW and DVD±R/RW,
DVD-DL with no problems using
cdrecord -dev=/dev/hdb
it _currently_ works, no matter how ugly or not this is from either Jörg's
or any other developer's POV - therefore it's fine from the end-user's POV.

I can write DVDs at 8x speed (approx 10816 KB/sec) - which looks like DMA
is working through the current mechanism, although I can't confirm it.

There have been reports that cdrecord does not work when setuid, but only
when you are "truly root". Not sure where this comes from,
(current->euid==0&&current->uid!=0 maybe?) scsi layer somewhere?

I'm fine (=I agree) with the general possibility of having it setuid,
though.

>3. match the lists found in 1 and 2
>
>4. ONLY AFTER THAT negotiate who is going to change what to make things
> work better for us end users.

>S3 Device enumeration/probing is a sore spot. Unprivileged "cdrecord

>dev=ATA: -scandisk" doesn't work, and recent discussions on the cdwrite@
>list didn't make any progress. My observation is that cdrecord stops
>probing /dev/hd* devices as soon as one yields EPERM, on the assumption
>"if I cannot access /dev/hda, I will not have sufficient privilege to
>write a CD anyways". I find this wrong, Jörg finds it correct and argues
>"if you can access /dev/hdc as unprivileged user, that's a security
>problem".

If you can access a _harddisk_ as a normal user, you _do have_ a security
problem. If you can access a cdrom as normal user, well, the opinions
differ here. I think you _should not either_, because it might happen that
you just left your presentation cd in a cdrom device in a public box. You
would certainly not want to have everyone read that out.

SUSE currently does it in A Nice Way: setfacl'ing the devices to include
read access for currently logged-in users. (Well, if someone logs on tty1
after you, you're screwed anyway - he could have just ejected the cd when
he's physically at the box.)

Yes, the device numbering is not optimal. (I already hear someone saying
'have udev make some sweety symlink in /dev'.)
But in case of /dev/hd*, we are pretty sure of what device is connected
where. In case of sd*, it's AFAICS not - the next device plugged in gets
the next free sd slot.

Jan Engelhardt
--
| Software Engineer and Linux/Unix Network Administrator
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

Matthias Andree

unread,

Jan 24, 2006, 1:20:41 PM1/24/06

to

Jan Engelhardt schrieb am 2006-01-24:

> >2. find out the current state of affairs,
>

> I am currently able to properly write all sorts of CD-R/RW and DVDąR/RW,

> DVD-DL with no problems using
> cdrecord -dev=/dev/hdb
> it _currently_ works, no matter how ugly or not this is from either Jörg's
> or any other developer's POV - therefore it's fine from the end-user's POV.

cdrecord simply assumes that if you don't have access to /dev/hda,
scanning the other devices is pointless, on the assumption it were a
security risk. How this fits into user profiles that might allow access
to /dev/hdc, is unclear to me.

> I can write DVDs at 8x speed (approx 10816 KB/sec) - which looks like DMA
> is working through the current mechanism, although I can't confirm it.

/dev/hd* and ATA: support DMA, newer cdrecord versions actually check
the DMA speed before starting write operations without burnproof.

> There have been reports that cdrecord does not work when setuid, but only
> when you are "truly root". Not sure where this comes from,
> (current->euid==0&&current->uid!=0 maybe?) scsi layer somewhere?

Locking pages in memory so they aren't swapped out (a requirement for
real-time applications) -- that's the original reason for my
RLIMIT_MEMLOCK question that preceded this thread.

> If you can access a _harddisk_ as a normal user, you _do have_ a security
> problem. If you can access a cdrom as normal user, well, the opinions
> differ here. I think you _should not either_, because it might happen that
> you just left your presentation cd in a cdrom device in a public box. You
> would certainly not want to have everyone read that out.

That's less of a problem than sending vendor-specific commands - one
might be "update firmware", which would allow the user to destroy the
drive.

> SUSE currently does it in A Nice Way: setfacl'ing the devices to include
> read access for currently logged-in users. (Well, if someone logs on tty1
> after you, you're screwed anyway - he could have just ejected the cd when
> he's physically at the box.)

There are some things to complicate matters. SUSE patch subfs into the
kernel and ship the needed user-space, think of this as quick
automounter. It releases the drive and unmounts the medium when the last
file is closed. In older SUSE releases, tty? logins didn't trigger
such access controls, only "desktop" logins through kdm or gdm.

> Yes, the device numbering is not optimal. (I already hear someone saying
> 'have udev make some sweety symlink in /dev'.)
> But in case of /dev/hd*, we are pretty sure of what device is connected
> where. In case of sd*, it's AFAICS not - the next device plugged in gets
> the next free sd slot.

What matters is sg, and perhaps sr.

Jan Engelhardt

unread,

Jan 24, 2006, 4:00:15 PM1/24/06

to

>> SUSE currently does it in A Nice Way: setfacl'ing the devices to include
>> read access for currently logged-in users. (Well, if someone logs on tty1
>> after you, you're screwed anyway - he could have just ejected the cd when
>> he's physically at the box.)
>
>There are some things to complicate matters. SUSE patch subfs into the
>kernel and ship the needed user-space, think of this as quick
>automounter. It releases the drive and unmounts the medium when the last
>file is closed. In older SUSE releases, tty? logins didn't trigger
>such access controls, only "desktop" logins through kdm or gdm.

I think this is independent of subfs. This is, afaicg, a resmgrd thing. And
since I do not use [a-z]dm, but tty login + startx, well, you can
guess.

>> Yes, the device numbering is not optimal. (I already hear someone saying
>> 'have udev make some sweety symlink in /dev'.)
>> But in case of /dev/hd*, we are pretty sure of what device is connected
>> where. In case of sd*, it's AFAICS not - the next device plugged in gets
>> the next free sd slot.
>
>What matters is sg, and perhaps sr.

Where is the difference between SG_IO-on-hdx and sg0?

Jan Engelhardt
--

Theodore Ts'o

unread,

Jan 24, 2006, 5:00:26 PM1/24/06

to

On Tue, Jan 24, 2006 at 10:15:40AM +0100, Arjan van de Ven wrote:
> On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > > the situation is messy; I can see some value in the hack Ted proposed to
> > > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > > be fair it's a hack :(
> >
> > As all other rlimits are honored even if you are root, it looks not orthogonal
> > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.
>
> that's another solution; give root a higher rlimit by default for this.
> It's also a bit messy, but a not-unreasonable default behavior.

I thought in the case we were talking about, the problem is that we
have a setuid program which calls mlockall() but then later drops its
privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK
applies again, and so all future memory allocations would fail.

What I proposed is a hack, but strictly speaking not necessary
according to the POSIX standards, but the problem is that a portable
program can't be expected to know that Linux has a RLIMIT_MEMLOCK
resource limit, such that a program which calls mlockall() and then
drops privileges will work under Solaris and fail under Linux. Hence
I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
Yes, no question it's a hack and a special case; the question is
whether cure or the disease is worse.

- Ted

Matthias Andree

unread,

Jan 24, 2006, 6:30:15 PM1/24/06

to

Matthias Andree schrieb am 2006-01-25:

> What if the limit were RLIM_INFINITY for root processes instead of
> hacking mlockall() and the resource checks?

OK, reading Edgar's hint, the answer is "It's a bad idea."

--
Matthias Andree

Matthias Andree

unread,

Jan 24, 2006, 6:30:19 PM1/24/06

to

Theodore Ts'o schrieb am 2006-01-24:

> I thought in the case we were talking about, the problem is that we
> have a setuid program which calls mlockall() but then later drops its
> privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK
> applies again, and so all future memory allocations would fail.

That's the coarse view. In fact, the application does not call setuid()
at this time, but only seteuid(), so it can regain privileges later, and
will in fact do that.

The application in question does this:

(root here)
1 mlockall()
2 seteuid(500); /* park privileges for a moment */
3 valloc(63 kB); /* fails since 2.6.9's tight MEMLOCK limit */

The first patch I suggested for the application exchanged steps #2 and
#3 and works, but is not acceptable to Jörg. We haven't talked about the
reasons.

The idea behind my patch was this: if it wants the memory locked (which
is a privileged operation on many systems anyways), then why not
allocate as root? Would this hurt portability to any other system? I
don't think so. Is such a rationale unreasonable in itself? Not either.

Further patch suggestions negotiated forth and back on raising the limit
and to what value.

The other problem is that glibc 2.3.5 is part of the story, but
off-topic here, because glibc is the link between valloc() (application
side) and the mmap() (kernel side).

> What I proposed is a hack, [and] strictly speaking not necessary

> according to the POSIX standards, but the problem is that a portable
> program can't be expected to know that Linux has a RLIMIT_MEMLOCK
> resource limit, such that a program which calls mlockall() and then
> drops privileges will work under Solaris and fail under Linux. Hence
> I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

Is the KERNEL the right place to implement policy such as setting
locked-page limits to 32 kB?

What if the limit were RLIM_INFINITY for root processes instead of
hacking mlockall() and the resource checks?

--
Matthias Andree

Edgar Toernig

unread,

Jan 24, 2006, 6:30:21 PM1/24/06

to

Theodore Ts'o wrote:
>
> ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.

> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

What about exec? The memory locks are removed on exec but with that
hack the raised limit would stay. Looks like a security bug.

Ciao, ET.

Albert Cahalan

unread,

Jan 24, 2006, 10:00:15 PM1/24/06

to

Joerg Schilling writes:
> Matthias Andree <matthia...@gmx.de> wrote:

>> S3 Device enumeration/probing is a sore spot. Unprivileged
>> "cdrecord dev=ATA: -scandisk" doesn't work, and recent
>> discussions on the cdwrite@ list didn't make any progress.
>> My observation is that cdrecord stops probing /dev/hd* devices
>> as soon as one yields EPERM, on the assumption "if I cannot
>> access /dev/hda, I will not have sufficient privilege to

>> write a CD anyways". I find this wrong, Joerg finds it correct

>> and argues "if you can access /dev/hdc as unprivileged user,
>> that's a security problem".
>

> This are two problems:
>
> - users of cdrecord like to run cdrecord -scanbus in order
> to find all SCSI devices. This no longer works since the
> non-orthogonal /dev/hd* SCSI transport has been added.
>
> As Linux already implements a Generic SCSI transport
> interface (/dev/sg*) people would asume to be able to
> talk to _all_ SCSI devices using this interface.
> To allows this, there is a need for a SCSI HBA driver
> that sends SCSI commands via a ATA interface.
>
> - some people seem to set the permissions of some of the
> /dev/hd* nodes to unsafe values and then complain that
> the other /dev/hd* nodes cannot be opened.

**sigh**

Matthias Andree said "(Joerg, please don't talk about layer
violations here)", yet you do.

We Linux users will forever patch your software to work the
way every Linux app is supposed to work. (well, assuming
nobody succumbs to a well-caffeinated urge to fork the code)

Really, "users of cdrecord like to run cdrecord -scanbus"???
They LIKE running a command to generate phony SCSI addresses?
That's news to me.

To better protect users from terrible accidents, Linux should
avoid assigning a /dev/sg* device for anything with a regular
device file. This, along with elimination of the obsolete
ide-scsi crud, would make things a lot more safe and sane.

BTW, before Joerg mentions portability, I'd like to remind
everyone that all modern OSes support the use of normal device
names for SCSI. The most awkward is FreeBSD, where you have
to do a syscall or two to translate the name to Joerg's very
non-hotplug non-iSCSI way of thinking. Windows, MacOS X, and
even Solaris all manage to handle device names just fine. In
numerous cases, not just Linux, cdrecord is inventing crap out
of thin air to satisfy a pre-hotplug worldview.

Albert Cahalan

unread,

Jan 24, 2006, 10:30:08 PM1/24/06

to

Jan Engelhardt writes:

> Where is the difference between SG_IO-on-hdx and sg0?

It's like the /dev/ttyS* and /dev/cua* situation, where
we also ended up with multiple device files. This is bad.

SG_IO-on-hdx is modern. It properly associates everything
with one device, which you may name as desired.

sg0 is useful for devices that are not disk, tape, or CD.
A decade ago, it was also the proper way to send raw SCSI
commands to other devices. For nasty compatibility reasons,
Linux still assigns /dev/sg* devices for disk, tape, and CD.

Joerg Schilling

unread,

Jan 25, 2006, 9:10:10 AM1/25/06

to

Jan Engelhardt <jen...@linux01.gwdg.de> wrote:

> >1. compile a list of their requirements,
>
> Have as few code duplicated (e.g. ATAPI and SCSI may share some - after
> all, ATAPI is (to me) some sort of SCSI tunneled in ATA.)

Thank you! This is a vote _pro_ a unified SCSI generic implementation for all
types of devices. The current implementation unneccssarily duplicates a lot
of code.

> Make it, in accordance with the above, possible to have as few kernel
> modules loaded as possible and therefore reducing footprint - if I had not
> to load sd_mod for usb_storage fun, I would get an itch to load a 78564
> byte scsi_mod module just to be able to use ATAPI. (MINOR one, though.)

On Solaris, the SCSI glue code (between hostadaptor drivers and target drivers) is
really small:

/usr/ccs/bin/size /kernel/misc/scsi
28482 + 27042 + 2036 = 57560

And if you check the amount of completely unneeded code Linux currently has
just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
kernel when converting to a clean SCSI based design.

> Want to write CDs and DVDs "as usual" (see below).

Be careful: libscg is a _generic_ SCSI transport library.
Closing the eyes for anything but CD writing is not the right way.

> De-forest the SCSI subsystem for privilege checking (see below).

Sorry, I see nothing related below.

> >2. find out the current state of affairs,
>

> I am currently able to properly write all sorts of CD-R/RW and DVDÂąR/RW,

> DVD-DL with no problems using
> cdrecord -dev=/dev/hdb

Maybe I should enforce the official libscg device syntax in order to prevent
this from working in the future.

Anyway: the fact that it may work is no proof for correctness.

> I can write DVDs at 8x speed (approx 10816 KB/sec) - which looks like DMA
> is working through the current mechanism, although I can't confirm it.

In case you don't knw the story:

Linus Torvalds once claimed that introducing SG_IO support for
/dev/hd* would be acompanied with cleaning up DMA support in the kernel.

At that moment it turned out that it did not help at all as /dev/hd*
did not give DMA. Later this bug was fixed, but I am still waiting
to see the proposed DMA fix for ide-scsi.

> There have been reports that cdrecord does not work when setuid, but only
> when you are "truly root". Not sure where this comes from,
> (current->euid==0&&current->uid!=0 maybe?) scsi layer somewhere?

Depends on what you talk about.

Since about a year, there is a workaround for the incompatible interface change
introduced with Linux-2.6.8.1

On a recent RedHat system, cdrecord works installed suid root.

On a system running a kernel.org based Linux it has been reported to fail
because it does not get a SCSI transfer buffer.

> >write a CD anyways". I find this wrong, JĂśrg finds it correct and argues

> >"if you can access /dev/hdc as unprivileged user, that's a security
> >problem".
>
> If you can access a _harddisk_ as a normal user, you _do have_ a security
> problem. If you can access a cdrom as normal user, well, the opinions
> differ here. I think you _should not either_, because it might happen that
> you just left your presentation cd in a cdrom device in a public box. You
> would certainly not want to have everyone read that out.

Do you want everybody to be able to read or format a floppy disk?
Ignoring usual security rules sometimes _seem_ to make life easier but usually
does not. Just look in what kind of jungle Microsoft is just because that
started to allow insanely things for the sake of "user convenience".

> SUSE currently does it in A Nice Way: setfacl'ing the devices to include
> read access for currently logged-in users. (Well, if someone logs on tty1
> after you, you're screwed anyway - he could have just ejected the cd when
> he's physically at the box.)

It may make sense to do something like this for the user logged into the
console. In general it is a security problem.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 25, 2006, 9:30:22 AM1/25/06

to

On Wed, Jan 25 2006, Joerg Schilling wrote:
> Jan Engelhardt <jen...@linux01.gwdg.de> wrote:
>
> > >1. compile a list of their requirements,
> >
> > Have as few code duplicated (e.g. ATAPI and SCSI may share some - after
> > all, ATAPI is (to me) some sort of SCSI tunneled in ATA.)
>
> Thank you! This is a vote _pro_ a unified SCSI generic implementation for all
> types of devices. The current implementation unneccssarily duplicates a lot
> of code.

The block layer SG_IO is just that, it's completely transport agnostic.
There's not a lot of duplicated code. In the future, perhaps sg will
disappear and be replaced by bsg which is just the full block layer
implementation of that (SG_IO can currently be considered a subset of
that support).

> > Make it, in accordance with the above, possible to have as few kernel
> > modules loaded as possible and therefore reducing footprint - if I had not
> > to load sd_mod for usb_storage fun, I would get an itch to load a 78564
> > byte scsi_mod module just to be able to use ATAPI. (MINOR one, though.)
>
> On Solaris, the SCSI glue code (between hostadaptor drivers and target drivers) is
> really small:
>
> /usr/ccs/bin/size /kernel/misc/scsi
> 28482 + 27042 + 2036 = 57560
>
> And if you check the amount of completely unneeded code Linux currently has
> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> kernel when converting to a clean SCSI based design.

Please point me at that huge amount of code. Hint: there is none.

Deja vu, anyone?

--
Jens Axboe

Jan Engelhardt

unread,

Jan 25, 2006, 9:40:15 AM1/25/06

to

>> Where is the difference between SG_IO-on-hdx and sg0?
>
>It's like the /dev/ttyS* and /dev/cua* situation, where
>we also ended up with multiple device files. This is bad.
>
>SG_IO-on-hdx is modern. It properly associates everything
>with one device, which you may name as desired.

Let's analyze a case:
if /dev/sg0 would always be associated with /dev/hda,
/dev/sg1 always with /dev/hdb, no matter if there was actually a
hda/sg0 device present in the system - would that simplify
the problem?

Jan Engelhardt
--

Jan Engelhardt

unread,

Jan 25, 2006, 9:50:13 AM1/25/06

to

>> And if you check the amount of completely unneeded code Linux currently has
>> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
>> kernel when converting to a clean SCSI based design.
>
>Please point me at that huge amount of code. Hint: there is none.

I'm getting a grin:

15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
(no results)

Looks like it's already non-redundant :)

Jan Engelhardt
--

Jens Axboe

unread,

Jan 25, 2006, 9:50:14 AM1/25/06

to

On Wed, Jan 25 2006, Jan Engelhardt wrote:
>
> >> Where is the difference between SG_IO-on-hdx and sg0?
> >
> >It's like the /dev/ttyS* and /dev/cua* situation, where
> >we also ended up with multiple device files. This is bad.
> >
> >SG_IO-on-hdx is modern. It properly associates everything
> >with one device, which you may name as desired.
>
> Let's analyze a case:
> if /dev/sg0 would always be associated with /dev/hda,
> /dev/sg1 always with /dev/hdb, no matter if there was actually a
> hda/sg0 device present in the system - would that simplify
> the problem?

Forget /dev/sg0, it's meaningless and confusing to try and bind two
unrelated names to each other. You want to talk to /dev/hda, use
/dev/hda. Don't try and create a pseudo mapping between the two. That's
also where cdrecord gets it wrong on Linux - you don't need -scanbus. If
users think they do, it's either because Joerg brain washed them or
because they have been used to that bad interface since years ago when
it was unfortunately needed.

--
Jens Axboe

Jens Axboe

unread,

Jan 25, 2006, 10:00:20 AM1/25/06

to

On Wed, Jan 25 2006, Jens Axboe wrote:

> On Wed, Jan 25 2006, Jan Engelhardt wrote:
> >
> > >> And if you check the amount of completely unneeded code Linux currently has
> > >> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> > >> kernel when converting to a clean SCSI based design.
> > >
> > >Please point me at that huge amount of code. Hint: there is none.
> >
> > I'm getting a grin:
> >
> > 15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
> > (no results)
> >
> > Looks like it's already non-redundant :)
>

> SG_IO turns requests into REQ_BLOCK_PC (or blk_pc_request()) types, so
> you should probably check for that as well. But it's truly a miniscule
> amount of code, and if I got off my ass and folded cdrom_newpc_intr()
> and cdrom_pc_intr() into one (that was the intention), it would be even
> less.

BTW, I should point out that the fact that references to REQ_BLOCK_PC
and blk_pc_request() exists doesn't indicate duplicated code. Each low
level driver or layer (like SCSI, not the SCSI low level drivers) need
to transform REQ_BLOCK_PC requests into their command types.

--
Jens Axboe

Jens Axboe

unread,

Jan 25, 2006, 10:00:20 AM1/25/06

to

On Wed, Jan 25 2006, Jan Engelhardt wrote:
>
> >> And if you check the amount of completely unneeded code Linux currently has
> >> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> >> kernel when converting to a clean SCSI based design.
> >
> >Please point me at that huge amount of code. Hint: there is none.
>
> I'm getting a grin:
>
> 15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
> (no results)
>
> Looks like it's already non-redundant :)

SG_IO turns requests into REQ_BLOCK_PC (or blk_pc_request()) types, so

you should probably check for that as well. But it's truly a miniscule
amount of code, and if I got off my ass and folded cdrom_newpc_intr()
and cdrom_pc_intr() into one (that was the intention), it would be even
less.

It just looks like Joerg needs to do his homework, before spreading
false information on lkml. Then again, all his "arguments" are the same
as last time (and the time before, and before, and so on).

--
Jens Axboe

Jan Engelhardt

unread,

Jan 25, 2006, 10:20:08 AM1/25/06

to

>
>- you don't need -scanbus. If
>users think they do, it's either because Joerg brain washed them or
>because they have been used to that bad interface since years ago when
>it was unfortunately needed.

Now you're unfair.
-scanbus does a nice output of what cdwriters (and other capable devices)
are present. For me, that lists the cd writer and a CF slot from the
multitype usb flash reader.

There's one kind of not-so-advanced linux newbies that just go to walmart,
buy a computer and whack a linux system on it for fun, and they still don't
know if their cdrom is at /dev/hdb or /dev/hdc. Looking for dmesg is
usually a nightmare for them, and apart that -scanbus lists scsi
host,id,lun instead of /dev/hd* (don't comment on this kthx), it is
convenient for this sort of users to find out what's available.

So, and what about that compactflash reader? It is subject to dynamic
usb->scsi device association (depending on when you connect it, it may
either become sda, or sdb, or sdc, etc.), and -scanbus yet again provides
some way (albeit not useful, because it lists scsi,id,lun rather than
/dev/sd* - don't comment either) to see where it actually is.

Jan Engelhardt
--

Joerg Schilling

unread,

Jan 25, 2006, 10:20:14 AM1/25/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> cdrecord simply assumes that if you don't have access to /dev/hda,
> scanning the other devices is pointless, on the assumption it were a
> security risk. How this fits into user profiles that might allow access
> to /dev/hdc, is unclear to me.

Wrong: cdrecord asumes nothing. It is the SCSI Generic transport library libscg.

Note that libscg does not offer access to a block layer device like /dev/hd*
but rather to the transport layer _below_ /dev/hd*. If you ignore this fact you
will have problems to understand the rules.

> > If you can access a _harddisk_ as a normal user, you _do have_ a security
> > problem. If you can access a cdrom as normal user, well, the opinions
> > differ here. I think you _should not either_, because it might happen that
> > you just left your presentation cd in a cdrom device in a public box. You
> > would certainly not want to have everyone read that out.
>
> That's less of a problem than sending vendor-specific commands - one
> might be "update firmware", which would allow the user to destroy the
> drive.

I am not sure whether you understood the problem here. Cdrtools need to deal
with a lot of vendor specific commands.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 25, 2006, 10:30:09 AM1/25/06

to

Jan Engelhardt <jen...@linux01.gwdg.de> wrote:

> Where is the difference between SG_IO-on-hdx and sg0?

- Accessing _all_ SCSI devices from a unique name space.

- Using a driver that if located at the right layering level
(just above the transport) but not at the block level where
SCSI transport does not belong.

- Cutting down kernel size by avoiding multiple implemenations
of code for the same purpose.

There are of course more....

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 25, 2006, 10:40:09 AM1/25/06

to

On Wed, Jan 25 2006, Jan Engelhardt wrote:
>
> >

> >- you don't need -scanbus. If
> >users think they do, it's either because Joerg brain washed them or
> >because they have been used to that bad interface since years ago when
> >it was unfortunately needed.
>
> Now you're unfair.
> -scanbus does a nice output of what cdwriters (and other capable devices)
> are present. For me, that lists the cd writer and a CF slot from the
> multitype usb flash reader.
>
> There's one kind of not-so-advanced linux newbies that just go to walmart,
> buy a computer and whack a linux system on it for fun, and they still don't
> know if their cdrom is at /dev/hdb or /dev/hdc. Looking for dmesg is
> usually a nightmare for them, and apart that -scanbus lists scsi
> host,id,lun instead of /dev/hd* (don't comment on this kthx), it is
> convenient for this sort of users to find out what's available.
>
> So, and what about that compactflash reader? It is subject to dynamic
> usb->scsi device association (depending on when you connect it, it may
> either become sda, or sdb, or sdc, etc.), and -scanbus yet again provides
> some way (albeit not useful, because it lists scsi,id,lun rather than
> /dev/sd* - don't comment either) to see where it actually is.

You just want the device naming to reflect that. The user should not
need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
likely be using k3b or something graphical though, and just click on his
Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
help do this dynamically even.

If you are using cdrecord on the command line, you are by definition an
advanced user and know how to find out where that writer is.

--
Jens Axboe

Joerg Schilling

unread,

Jan 25, 2006, 10:40:13 AM1/25/06

to

"Theodore Ts'o" <ty...@mit.edu> wrote:

> I thought in the case we were talking about, the problem is that we
> have a setuid program which calls mlockall() but then later drops its
> privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK
> applies again, and so all future memory allocations would fail.
>
> What I proposed is a hack, but strictly speaking not necessary
> according to the POSIX standards, but the problem is that a portable
> program can't be expected to know that Linux has a RLIMIT_MEMLOCK
> resource limit, such that a program which calls mlockall() and then
> drops privileges will work under Solaris and fail under Linux. Hence
> I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

Maybe, I should give some hints...

RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994.
The iplementation is incomplete since then and partially disabled (size check
for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4

FreeBSD currently uses a default value of RLIMIT_INFINITY for users.

I could add this piece of code to the euid == 0 part of cdrecord:

LOCAL void
raise_memlock()
{
#ifdef RLIMIT_MEMLOCK
struct rlimit rlim;

rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY;

if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits.");
#endif /* RLIMIT_NOFILE */
}

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 25, 2006, 10:40:16 AM1/25/06

to

Edgar Toernig <fro...@gmx.de> wrote:

> Theodore Ts'o wrote:
> >
> > ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> > Yes, no question it's a hack and a special case; the question is
> > whether cure or the disease is worse.
>
> What about exec? The memory locks are removed on exec but with that
> hack the raised limit would stay. Looks like a security bug.

The RLIMIT_MEMLOCK feature itself may be a security bug implemented the way it
currentlyy is.

For me it would make sense to be able to lock everything in core and then
be able to tell the system that at most 1MB of additional memory may be locked.

In this case, there should be no general failure but the possibility to
verify that the value is sufficient for usual cases.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Matthias Andree

unread,

Jan 25, 2006, 11:10:05 AM1/25/06

to

Joerg Schilling wrote:

> RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994.
> The iplementation is incomplete since then and partially disabled (size check
> for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4
>
> FreeBSD currently uses a default value of RLIMIT_INFINITY for users.

And while it does that (or in fact, rather not distinguish between root and
unprivileged users), mlock() and mlockall() are privileged operations on
FreeBSD.

> I could add this piece of code to the euid == 0 part of cdrecord:
>
> LOCAL void
> raise_memlock()
> {
> #ifdef RLIMIT_MEMLOCK
> struct rlimit rlim;
>
> rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY;
>
> if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
> errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits.");
> #endif /* RLIMIT_NOFILE */
> }

Except that your new #endif comment is wrong, that is exactly what I
suggested and what I've tried and found working.

Joerg Schilling

unread,

Jan 25, 2006, 11:40:17 AM1/25/06

to

Albert Cahalan <acah...@gmail.com> wrote:

> We Linux users will forever patch your software to work the

Looks like you are not a native English speaker. "We" is incorrect here, as you
only speak for yourself.

> BTW, before Joerg mentions portability, I'd like to remind
> everyone that all modern OSes support the use of normal device
> names for SCSI. The most awkward is FreeBSD, where you have
> to do a syscall or two to translate the name to Joerg's very
> non-hotplug non-iSCSI way of thinking. Windows, MacOS X, and
> even Solaris all manage to handle device names just fine. In
> numerous cases, not just Linux, cdrecord is inventing crap out
> of thin air to satisfy a pre-hotplug worldview.

Looks like you are badly informed, so I encourage you to get yourself informed
properly before sending your next postig....

libscg includes 22 different SCSI low level transport implementations.

- Only 5 of them allow a /dev/hd* device name related access.

- 11 of them use file descriptors as handles for sending SCSI
commands but do not have a name <-> fs relation and thus
_need_ a SCSI device naming scheme as libscg offers.
This is because there is no 1:1 relation between SCSI addressing
and a fd retrieved from a /dev/* entry.

- 6 of them not even allow to get a file descriptors as handles for
sending SCSI commands. These platforms of course need the SCSI device
naming scheme as libscg offers.

Conclusion:

17 Platforms _need_ the addressing scheme libscg offers

5 Platforms _may_ use a different access method too.

NOTE: Amongst the 6 plaforms that do not allow to even get a file descriptor
there is a modern OS like MacOS X

BTW: the wording of your posting did give you a negative score.
If you continue the same way, it may be that your next posting will
remain unanswered even though it may be wring and needs a correction like this
one.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Kyle Moffett

unread,

Jan 25, 2006, 12:00:21 PM1/25/06

to

On Jan 25, 2006, at 11:31, Joerg Schilling wrote:
> Albert Cahalan <acah...@gmail.com> wrote:
>> We Linux users will forever patch your software to work the
>
> Looks like you are not a native English speaker. "We" is incorrect
> here, as you only speak for yourself.

I agree completely with his statements, therefore he speaks for at
least two people and "we" is proper usage. I suspect given the posts
on this list the last time this flamewar came up that there are more
as well, but 2 is enough.

> libscg includes...

Irrelevant to the discussion at hand, we are talking only about linux
and what should be done on linux.

> - Only 5 of them allow a /dev/hd* device name related access.

No, you have this wrong:

- One of them (IE: Linux) requires a /dev/[hs]d* device-name related
access

- Only 4 others allow /dev/hd*

However, the later is _completely_ _irrelevant_ to the discussion, as
we are talking about Linux *only*.

> [irrelevant discussion of other platforms]

> 17 Platforms _need_ the addressing scheme libscg offers
> 5 Platforms _may_ use a different access method too.

Wrong again:
17 platforms need libscg's addressing
4 platforms offer /dev/* access
1 platform (Linux) _requires_ /dev/* access

You are perfectly free to adjust your compatibility layer accordingly.

> BTW: the wording of your posting [...]

Personal attacks are offtopic, irrelevant, and rude. Please refrain
from doing so. If you don't plan to respond to somebody's email,
just don't, no reason to shout about it to a world who doesn't care.

Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
-- C.A.R. Hoare

Matthias Andree

unread,

Jan 25, 2006, 12:10:11 PM1/25/06

to

Kyle Moffett wrote:
> On Jan 25, 2006, at 11:31, Joerg Schilling wrote:
>> Albert Cahalan <acah...@gmail.com> wrote:
>>> We Linux users will forever patch your software to work the
>>
>> Looks like you are not a native English speaker. "We" is incorrect
>> here, as you only speak for yourself.
>
> I agree completely with his statements, therefore he speaks for at
> least two people and "we" is proper usage. I suspect given the posts
> on this list the last time this flamewar came up that there are more as
> well, but 2 is enough.
>
>> libscg includes...
>
> Irrelevant to the discussion at hand, we are talking only about linux
> and what should be done on linux.

Well, cdrecord relies on libscg, so in effect most of the portability code
that is affected is in libscg; some of the real-time code however is
specific to cdrecord.

>> - Only 5 of them allow a /dev/hd* device name related access.
>
> No, you have this wrong:
>
> - One of them (IE: Linux) requires a /dev/[hs]d* device-name related
> access

/dev/sd* for CD writing? I think you're off track here. AFAICS cdrecord uses
/dev/sg* to access the writer.

> - Only 4 others allow /dev/hd*
>
> However, the later is _completely_ _irrelevant_ to the discussion, as
> we are talking about Linux *only*.

This, and if the code can then be used on other platforms, then there is
little point in calling the Linux /dev/hd* device "badly designed", unless
there were problems with it that prevented cdrecord (or libscg, for pxupdate
or something like that) from working properly.

So I'll repeat my question: is there anything that SG_IO to /dev/hd* (via
ide-cd) cannot do that it can do via /dev/sg*? Device enumeration doesn't count.

The numbers we get from ide-scsi for ATAPI writers are skewed anyhow, I'm
getting 1,0,0 for a SATA hard disk, 2,0,0 for secondary master
DVD-RAM/±R[W], 3,0,0 for secondary slave CD-RW... I wonder why these could
be desirable, and if they are really as static as they pretend to be. I
doubt that, their numbers depend on the order of driver loading.

Joerg Schilling

unread,

Jan 25, 2006, 12:10:15 PM1/25/06

to

Jan Engelhardt <jen...@linux01.gwdg.de> wrote:

>
> >> And if you check the amount of completely unneeded code Linux currently has
> >> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> >> kernel when converting to a clean SCSI based design.
> >
> >Please point me at that huge amount of code. Hint: there is none.
>
> I'm getting a grin:
>
> 15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
> (no results)
>
> Looks like it's already non-redundant :)

everything in drivers/block/scsi_ioctl.c is duplicate code and I am sure I
would find more if I take some time....

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 25, 2006, 12:10:15 PM1/25/06

to

Jens Axboe <ax...@suse.de> wrote:

> It just looks like Joerg needs to do his homework, before spreading
> false information on lkml. Then again, all his "arguments" are the same
> as last time (and the time before, and before, and so on).

Before spreading your false claims, please do your homework.

We previously had mostly fruitful discussion before you did appear.
Please either try to contribute useful ideas or stay out.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 25, 2006, 12:10:20 PM1/25/06

to

Jens Axboe <ax...@suse.de> wrote:

> You just want the device naming to reflect that. The user should not
> need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
> likely be using k3b or something graphical though, and just click on his
> Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
> help do this dynamically even.

Guess why cdrecord -scanbus is needed.

It serves the need of GUI programs for cdrercord and allows them to retrieve
and list possible drives of interest in a platform independent way.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 25, 2006, 12:20:10 PM1/25/06

to

Kyle Moffett <mrmac...@mac.com> wrote:

> > [irrelevant discussion of other platforms]

Incorrect, sorry. Do you really make Linux incompatible to the rest of the
world?

> > 17 Platforms _need_ the addressing scheme libscg offers
> > 5 Platforms _may_ use a different access method too.
>
> Wrong again:
> 17 platforms need libscg's addressing
> 4 platforms offer /dev/* access
> 1 platform (Linux) _requires_ /dev/* access

Your last line is wrong

> You are perfectly free to adjust your compatibility layer accordingly.

The Linux Kernel fols unfortunately artificially hides information for the
/dev/hd* interface making exactly this compatibility impossible.

> Personal attacks are offtopic, irrelevant, and rude. Please refrain
> from doing so. If you don't plan to respond to somebody's email,
> just don't, no reason to shout about it to a world who doesn't care.

If you are against personal attacks, why didn't you intercede for the
postings from Jens Axboe and Albert Cahalan?

I am against personal attacks and this is the first time where it tooks
more than a day before LKML people started with personal attacks against me.
So in principle this is some sort of progress compared to former times.
If you like to continue this discussion, I would like you to stay reasonable
and help to keep the discussion stay based on technical based arguments.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 25, 2006, 12:20:11 PM1/25/06

to

On Wed, Jan 25 2006, Joerg Schilling wrote:

> Jan Engelhardt <jen...@linux01.gwdg.de> wrote:
>
> >
> > >> And if you check the amount of completely unneeded code Linux currently has
> > >> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> > >> kernel when converting to a clean SCSI based design.
> > >
> > >Please point me at that huge amount of code. Hint: there is none.
> >
> > I'm getting a grin:
> >
> > 15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
> > (no results)
> >
> > Looks like it's already non-redundant :)
>
> everything in drivers/block/scsi_ioctl.c is duplicate code and I am sure I
> would find more if I take some time....

axboe@nelson:[.]r/src/linux-2.6-block.git $ size block/scsi_ioctl.o
text data bss dec hex filename
2844 256 0 3100 c1c block/scsi_ioctl.o

And it's not everything that's duplicated, basically only the ioctl
parsing is. So either admit that there isn't a a lot of duplicated code,
or "take some time" and point me at it. Otherwise refrain from making
obviously false statements in the future.

--
Jens Axboe

Matthias Andree

unread,

Jan 25, 2006, 12:20:14 PM1/25/06

to

Joerg Schilling wrote:
> Jens Axboe <ax...@suse.de> wrote:
>
>> It just looks like Joerg needs to do his homework, before spreading
>> false information on lkml. Then again, all his "arguments" are the same
>> as last time (and the time before, and before, and so on).
>
> Before spreading your false claims, please do your homework.

I think we'd better call the whole discussion off.

In personal conversation with Jörg, I fell prey to the illusion he might
have grown up last week-end, and Lee's promising post was the idea to start
the whole thing and see if both sides get closer together, but it seems Jörg
is unwilling to stick to a civilized discussion.

Sorry for starting this noise.

Matthias Andree

unread,

Jan 25, 2006, 12:20:15 PM1/25/06

to

Joerg Schilling wrote:
> Jens Axboe <ax...@suse.de> wrote:
>
>> You just want the device naming to reflect that. The user should not
>> need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
>> likely be using k3b or something graphical though, and just click on his
>> Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
>> help do this dynamically even.
>
> Guess why cdrecord -scanbus is needed.
>
> It serves the need of GUI programs for cdrercord and allows them to retrieve
> and list possible drives of interest in a platform independent way.

There are bugs in the implementation that prevent -scanbus from working
properly, and they are not Linux bugs. Once -scanbus really scans all
devices and skips those it cannot access (rather than quitting), you might
have a point.

are added/removed - which

unread,

Jan 25, 2006, 12:20:15 PM1/25/06

to

El Wed, 25 Jan 2006 16:13:46 +0100 (MET),
Jan Engelhardt <jen...@linux01.gwdg.de> escribió:

> There's one kind of not-so-advanced linux newbies that just go to walmart,
> buy a computer and whack a linux system on it for fun, and they still don't
> know if their cdrom is at /dev/hdb or /dev/hdc. Looking for dmesg is
> usually a nightmare for them, and apart that -scanbus lists scsi
> host,id,lun instead of /dev/hd* (don't comment on this kthx), it is
> convenient for this sort of users to find out what's available.

Wait - Looking at dmesg is a nightmare for newbies, but cdrecord -scanbus
is not?

Users should be show the available devices in a pretty GUI and for that to
be possible, the kernel needs to provide a unified way to show userspace
the available devices and notify them when they are added/removed - which
happens to be sysfs + udev etc.

libscg seems to want to replace the operative system for some tasks
in the name of cross-platform compatibility. Sorry, but libscg is
not the center of the world. It's fine that cdrecord does what
it does for the apps for all those platforms where -scanbus and friends
has sense, but linux just has SG_IO. libscg wanting to offer access
to the "transport layer below /dev/hd*" looks like a layering design
violation in operative systems like linux, but it is fine that cdrecord
has it because it _is_ neccesary in other operative system which do
things differently.

Using the native features of a platform is a Good Thing when writing
cross-platform software, ie: glib provides a "threading emulation" where
threads are not available, but it uses the native pthreads if it's
available. libscg wants to do everything everywhere, and that'd have
sense if SG_IO weren't able to do what cdrecord needs, but AFAIK from the
multiple flamewars I've seen, SG_IO does everything that cdrecord
needs. I've not had a problem with SG_IO in years...

gru...@teleline.es

unread,

Jan 25, 2006, 12:30:09 PM1/25/06

to

El Wed, 25 Jan 2006 18:03:18 +0100,
Joerg Schilling <schi...@fokus.fraunhofer.de> escribió:

> Guess why cdrecord -scanbus is needed.
>
> It serves the need of GUI programs for cdrercord and allows them to retrieve
> and list possible drives of interest in a platform independent way.

But this is not neccesary at all, since linux platform already provides ways to
retrieve and list possible drives....

Joerg Schilling

unread,

Jan 25, 2006, 12:30:18 PM1/25/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> I think we'd better call the whole discussion off.

We could continue as long as people like Jens Axboe stay reasonable.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 25, 2006, 12:30:22 PM1/25/06

to

On Wed, Jan 25 2006, Matthias Andree wrote:
> Joerg Schilling wrote:
> > Jens Axboe <ax...@suse.de> wrote:
> >
> >> It just looks like Joerg needs to do his homework, before spreading
> >> false information on lkml. Then again, all his "arguments" are the same
> >> as last time (and the time before, and before, and so on).
> >
> > Before spreading your false claims, please do your homework.
>
> I think we'd better call the whole discussion off.
>
> In personal conversation with Jörg, I fell prey to the illusion he might
> have grown up last week-end, and Lee's promising post was the idea to start
> the whole thing and see if both sides get closer together, but it seems Jörg
> is unwilling to stick to a civilized discussion.
>
> Sorry for starting this noise.

Agreed, it's the same thing that happens each and every time he posts
here.

--
Jens Axboe

Joerg Schilling

unread,

Jan 25, 2006, 12:30:23 PM1/25/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> > Irrelevant to the discussion at hand, we are talking only about linux
> > and what should be done on linux.
>
> Well, cdrecord relies on libscg, so in effect most of the portability code
> that is affected is in libscg; some of the real-time code however is
> specific to cdrecord.

This is correct, as (looking at other programs from cdrtools) cdrecord is the
only program that needs realtime scheduling.

> So I'll repeat my question: is there anything that SG_IO to /dev/hd* (via
> ide-cd) cannot do that it can do via /dev/sg*? Device enumeration doesn't count.

But device enumeration is the central point when implementing -scanbus.

Note that all OS that I am aware of internally use a device enumeration scheme
that is close to what libscg uses. This ie even true for Linux. If Linux did not
hide this information for /dev/hd* based fd's, I could implement an abstraction
layer.....

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 25, 2006, 12:30:25 PM1/25/06

to

On Wed, Jan 25 2006, Joerg Schilling wrote:
> Jens Axboe <ax...@suse.de> wrote:
>
> > It just looks like Joerg needs to do his homework, before spreading
> > false information on lkml. Then again, all his "arguments" are the same
> > as last time (and the time before, and before, and so on).
>
> Before spreading your false claims, please do your homework.

Sorry Joerg, _you_ really are the one that has to do your homework as
you aptly demonstrated.

> We previously had mostly fruitful discussion before you did appear.
> Please either try to contribute useful ideas or stay out.

You are spreading blatently false claim about the code and repeating the
same old suggestions on what _you_ think Linux design should be like. So
I had to correct it, which I did.

--
Jens Axboe

Jens Axboe

unread,

Jan 25, 2006, 12:30:27 PM1/25/06

to

On Wed, Jan 25 2006, Joerg Schilling wrote:
> Matthias Andree <matthia...@gmx.de> wrote:
>
> > I think we'd better call the whole discussion off.
>
> We could continue as long as people like Jens Axboe stay reasonable.

I would have no problems if you weren't spreading your misguided
information disguised as real info. I've had thousands of useful
conversations on lkml in the past many years, but I fail to remember
just one with you involved (whether I participated or not).

I'll refrain from writing further mails in this thread, unless you
actually "find some time" to back up your claims with real data.

--
Jens Axboe

Matthias Andree

unread,

Jan 25, 2006, 12:40:07 PM1/25/06

to

Joerg Schilling wrote:

>> So I'll repeat my question: is there anything that SG_IO to /dev/hd* (via
>> ide-cd) cannot do that it can do via /dev/sg*? Device enumeration doesn't count.
>
> But device enumeration is the central point when implementing -scanbus.

Again: Is there anything *besides* (<German>: außer) device enumeration that
does not work with the current /dev/hd* SG_IO interface?

Jens Axboe

unread,

Jan 25, 2006, 12:40:10 PM1/25/06

to

On Wed, Jan 25 2006, gru...@teleline.es wrote:
> El Wed, 25 Jan 2006 18:03:18 +0100,
> Joerg Schilling <schi...@fokus.fraunhofer.de> escribió:
>
> > Guess why cdrecord -scanbus is needed.
> >
> > It serves the need of GUI programs for cdrercord and allows them to retrieve
> > and list possible drives of interest in a platform independent way.
>
> But this is not neccesary at all, since linux platform already
> provides ways to retrieve and list possible drives....

In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
on potentially useful devices.

--
Jens Axboe

Kyle Moffett

unread,

Jan 25, 2006, 1:10:09 PM1/25/06

to

On Jan 25, 2006, at 12:14:15, Joerg Schilling wrote:
> Incorrect, sorry. Do you really make Linux incompatible to the rest
> of the world?

Why should we care about compatibility with those interfaces? Half
our networking stack includes interfaces (like IPTables) that aren't
compatible with _BSD_ from which parts of it were derived, let alone
with Windows or Solaris.

>> 1 platform (Linux) _requires_ /dev/* access
> Your last line is wrong

No, it is correct. We require /dev/* access. The fact that we
included /dev/sg* devices for /dev/[sh]d* was a mistake, and should
be fixed, but those are still /dev/* access.

>> You are perfectly free to adjust your compatibility layer
>> accordingly.
> The Linux Kernel fols unfortunately artificially hides information
> for the /dev/hd* interface making exactly this compatibility
> impossible.

We provide enough information for everybody else to be happy,
including the dvd+rw-tools package. What else do you need and why?

>> Personal attacks are offtopic, irrelevant, and rude. Please
>> refrain from doing so. If you don't plan to respond to somebody's
>> email, just don't, no reason to shout about it to a world who
>> doesn't care.
>
> If you are against personal attacks, why didn't you intercede for
> the postings from Jens Axboe and Albert Cahalan?

Because I didn't see them.

> I am against personal attacks and this is the first time where it
> tooks more than a day before LKML people started with personal
> attacks against me.

I would encourage you to ignore all personal attacks. The people
making them are doing so frequently because either (A) they feel they
have been attacked and are retaliating or (B) they don't have a valid
technical point to make. In either case the signal-to-noise ratio is
better if you ignore the attack and don't respond in turn, which will
frequently cause the offending party to cease their attacks as well.

One other note: Please do not tell Linux kernel developers that you
know what is best for the Linux kernel. If you have a specific bug
or a proposed patch it will be thoroughly considered, but vague
declarations of problems are insufficient.

Cheers,
Kyle Moffett

Jens Axboe

unread,

Jan 25, 2006, 1:20:22 PM1/25/06

to

On Wed, Jan 25 2006, Joerg Schilling wrote:
> > So I'll repeat my question: is there anything that SG_IO to /dev/hd* (via
> > ide-cd) cannot do that it can do via /dev/sg*? Device enumeration doesn't count.
>
> But device enumeration is the central point when implementing
> -scanbus.

And that's why I state it's useless on Linux.

> Note that all OS that I am aware of internally use a device
> enumeration scheme that is close to what libscg uses. This ie even
> true for Linux. If Linux did not hide this information for /dev/hd*
> based fd's, I could implement an abstraction layer.....

Not true, Linux has nothing of the sort internally for eg ATAPI devices.
I don't know why you think that, but it's simply not true at all.

--
Jens Axboe

Matthias Andree

unread,

Jan 25, 2006, 1:30:16 PM1/25/06

to

Jens Axboe wrote:

> In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
> on potentially useful devices.

Hm. sysfs, procfs, udev, hotplug, netlink (for IPv6) - this all looks rather
complicated and non-portable. I understand that applications that can just
open every device and send SCSI INQUIRY might want to do that on Linux, too.

Jens Axboe

unread,

Jan 25, 2006, 1:30:16 PM1/25/06

to

On Wed, Jan 25 2006, Matthias Andree wrote:
> Jens Axboe wrote:
>
> > In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
> > on potentially useful devices.
>
> Hm. sysfs, procfs, udev, hotplug, netlink (for IPv6) - this all looks rather
> complicated and non-portable. I understand that applications that can just
> open every device and send SCSI INQUIRY might want to do that on Linux, too.

Certainly, I'm just suggesting a better way to do it on Linux.

--
Jens Axboe

Tomasz Torcz

unread,

Jan 25, 2006, 2:10:16 PM1/25/06

to

On Wed, Jan 25, 2006 at 06:03:18PM +0100, Joerg Schilling wrote:
> Jens Axboe <ax...@suse.de> wrote:
>
> > You just want the device naming to reflect that. The user should not
> > need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
> > likely be using k3b or something graphical though, and just click on his
> > Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
> > help do this dynamically even.
>
> Guess why cdrecord -scanbus is needed.
>
> It serves the need of GUI programs for cdrercord and allows them to retrieve
> and list possible drives of interest in a platform independent way.

GUI programs tend to retrieve this kind of info form HAL
(http://freedesktop.org/wiki/Software_2fhal)

--
Tomasz Torcz "Funeral in the morning, IDE hacking
zdzichu@irc.-nie.spam-.pl in the afternoon and evening." - Alan Cox

Olivier Galibert

unread,

Jan 25, 2006, 2:10:17 PM1/25/06

to

On Wed, Jan 25, 2006 at 06:31:27PM +0100, Jens Axboe wrote:
> In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
> on potentially useful devices.

Serious question, what and how? If I scan /sys/block for example for
potential candidates, that won't give me the devices or tell me the
name udev decided to use for it in /dev.

And I'm not sure how to know if something is cdrom-ish and SG_IO able
from sysfs. Should I filter on driver name? But then, I don't know
which names are acceptable (*cdrom* ?)...

Or maybe I should go through the fad-of-the-day, hal/dbus?

OG.

Lee Revell

unread,

Jan 25, 2006, 3:20:15 PM1/25/06

to

On Wed, 2006-01-25 at 17:57 +0100, Joerg Schilling wrote:
> Jan Engelhardt <jen...@linux01.gwdg.de> wrote:
>
> >
> > >> And if you check the amount of completely unneeded code Linux currently has
> > >> just to implement e.g. SG_IO in /dev/hd*, it could even _save_ space in the
> > >> kernel when converting to a clean SCSI based design.
> > >
> > >Please point me at that huge amount of code. Hint: there is none.
> >
> > I'm getting a grin:
> >
> > 15:46 takeshi:../drivers/ide > find . -type f -print0 | xargs -0 grep SG_IO
> > (no results)
> >
> > Looks like it's already non-redundant :)
>
> everything in drivers/block/scsi_ioctl.c is duplicate code and I am sure I
> would find more if I take some time....

PLEASE don't cc: me on this asinine thread anymore.

Argh, I KNEW this would end with the same exact flame war.

Lee

jerome lacoste

unread,

Jan 25, 2006, 5:10:06 PM1/25/06

to

On 1/25/06, Jens Axboe <ax...@suse.de> wrote:
> On Wed, Jan 25 2006, Jan Engelhardt wrote:
> >
> > >
> > >- you don't need -scanbus. If
> > >users think they do, it's either because Joerg brain washed them or
> > >because they have been used to that bad interface since years ago when
> > >it was unfortunately needed.
> >
> > Now you're unfair.
> > -scanbus does a nice output of what cdwriters (and other capable devices)
> > are present. For me, that lists the cd writer and a CF slot from the
> > multitype usb flash reader.

> >
> > There's one kind of not-so-advanced linux newbies that just go to walmart,
> > buy a computer and whack a linux system on it for fun, and they still don't
> > know if their cdrom is at /dev/hdb or /dev/hdc. Looking for dmesg is
> > usually a nightmare for them, and apart that -scanbus lists scsi
> > host,id,lun instead of /dev/hd* (don't comment on this kthx), it is
> > convenient for this sort of users to find out what's available.
> >

> > So, and what about that compactflash reader? It is subject to dynamic
> > usb->scsi device association (depending on when you connect it, it may
> > either become sda, or sdb, or sdc, etc.), and -scanbus yet again provides
> > some way (albeit not useful, because it lists scsi,id,lun rather than
> > /dev/sd* - don't comment either) to see where it actually is.

>
> You just want the device naming to reflect that. The user should not
> need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
> likely be using k3b or something graphical though, and just click on his
> Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
> help do this dynamically even.
>

> If you are using cdrecord on the command line, you are by definition an
> advanced user and know how to find out where that writer is.
>
> --
> Jens Axboe

As an non expert who just wants his boxes to work out of the box, I
feel that the above message best represents the issue at end.

Joerg seems to be concerned by 2 things:
- the portability of his application accross various platforms
- provide an end-to-end application for writing CDs from both the
command line and to various 3rd party front ends.

Providing a cross platform way to reference the devices seems to help
him achieve that goal.

Linux developers seem to see the world in a different way. Their main
requirements:
- compliance with the linux way of doing things
- ultimately a front end should hide all the dirty details. That
doesn't mean a command line version has to hide them as well, nor that
cdrecord should be the interface to ask things the operating system
can provide

So it looks like:
- from a cdrecord point ov view, Linux is broken.
- from a Linux developers point of view, cdrecord is doing too much
and forces things up.

As a developper with absolutely no competence in this area, I wonder
something: I don't see why the way to refer to a device affects the
ability to perform the functionality (write to it).

Isn't it possible to reorganize the code in such a way that the
burning part can be independent of the way the devices are referred
to?

I downloaded the code and quickly looked at it. If I am looking at the
right version, it seems that the cdrecord code that relates to both cd
burning + the Linux specific part is not that big (and very readable,
thanks Joerg). So I really don't understand why this issue doesn't get
fixed.

</very naively>

As with the communication problems between Joerg and the Linux
developers, if someone was stepping up to be the bridge betweem the 2
parties, couldn't that minimize the risk of flame wars?

cdrecord, how important is Linux to you?
Linux, how important is cdrecord to you?

If you 2 can't get along, just divorce! It's 2006 after all. And the
code is open.
Otherwise, talk together or use a counsellor and make this relationship work.

Jerome

Matthias Andree

unread,

Jan 25, 2006, 6:10:19 PM1/25/06

to

Joerg Schilling schrieb am 2006-01-25:

> > You are perfectly free to adjust your compatibility layer accordingly.
>
> The Linux Kernel fols unfortunately artificially hides information for the
> /dev/hd* interface making exactly this compatibility impossible.

What information is actually missing? You keep talking about phantoms,
without naming them. Again: device enumeration doesn't count, libscg
already does that.

> If you are against personal attacks, why didn't you intercede for the
> postings from Jens Axboe and Albert Cahalan?

Because ignoring attacks is more efficient.

--
Matthias Andree

Matthias Andree

unread,

Jan 25, 2006, 6:20:11 PM1/25/06

to

(stripped Lee from the Cc: list)

Jens Axboe schrieb am 2006-01-25:

> > Hm. sysfs, procfs, udev, hotplug, netlink (for IPv6) - this all looks rather
> > complicated and non-portable. I understand that applications that can just
> > open every device and send SCSI INQUIRY might want to do that on Linux, too.
>
> Certainly, I'm just suggesting a better way to do it on Linux.

Great. There's a better way, but it is not necessary. Let Linux-specific
applications use it for their benefit, but a portable application isn't
going that way because it's too much effort. If a simpler interface that
can be shared with half a dozen other system exists, the portable
application will use that and ignore better interfaces.

--
Matthias Andree

Matthias Andree

unread,

Jan 25, 2006, 6:30:15 PM1/25/06

to

Joerg Schilling schrieb am 2006-01-25:

> Matthias Andree <matthia...@gmx.de> wrote:
>
> > I think we'd better call the whole discussion off.
>
> We could continue as long as people like Jens Axboe stay reasonable.

No. The deal was people stating their requirements, not mounting
personal attacks against others. I posted the same question (what's
lacking) several times, and your constant answer "device enumeration"
makes me assume that it's the only thing you believe is missing.

Since libscg scans all /dev/pg* and /dev/sg* (for transport indicator ""
which means plain sg) and all /dev/hd* (for transport indicator ATA:
which means /dev/hd*) and turns it into bus, host, lun anyways, there
does not appear to be a need to move this code into the kernel.

What *else* is missing?

--
Matthias Andree

Bodo Eggert

unread,

Jan 25, 2006, 7:20:05 PM1/25/06

to

Joerg Schilling <schi...@fokus.fraunhofer.de> wrote:

> I could add this piece of code to the euid == 0 part of cdrecord:
>
> LOCAL void
> raise_memlock()
> {
> #ifdef RLIMIT_MEMLOCK
> struct rlimit rlim;
>
> rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY;

I think you should rather use the size you're going to mlock, or at least
the upper bound.
--
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

Nix

unread,

Jan 25, 2006, 7:40:13 PM1/25/06

to

On 25 Jan 2006, Matthias Andree prattled cheerily:

> Jens Axboe wrote:
>
>> In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
>> on potentially useful devices.
>
> Hm. sysfs, procfs, udev, hotplug, netlink (for IPv6) - this all looks rather
> complicated and non-portable. I understand that applications that can just
> open every device and send SCSI INQUIRY might want to do that on Linux, too.

Applications (already) do this by asking HAL, which can be informed of
new devices in a variety of ways: the up-and-coming one is for the
kernel to notify udevd, following which a udev rule sends a dbus message
to HAL. Everything from the dbus message on up is cross-OS portable.
-scanbus is *totally* unnecessary.

(Furthermore, it fails to work in a quite laughable fashion in the
presence of hotpluggable storage media. udev handles giving hotpluggable
storage media consistent device names with extreme ease, and tells HAL
about them so that users see the new devices appear even if they don't
have a clue that /dev even exists.

The change that J. Random Nontechnical User will ever run `cdrecord
-scanbus' is *nil*, and applications don't run it either because they
can't judge between all the devices that are listed to pick the one
which is a CD recorder (consider the consequences should they guess
wrong!). Instead, they invariably ask for a device name, or, in more
recent versions get the info from HAL. HAL knows if something is a CD
recorder because its backend, e.g. udev, told it.)

--
`Everyone has skeletons in the closet. The US has the skeletons
driving living folks into the closet.' --- Rebecca Ore

gru...@teleline.es

unread,

Jan 25, 2006, 8:20:09 PM1/25/06

to

El Thu, 26 Jan 2006 00:14:22 +0100,
Matthias Andree <matthia...@gmx.de> escribió:

> Great. There's a better way, but it is not necessary. Let Linux-specific
> applications use it for their benefit, but a portable application isn't
> going that way because it's too much effort. If a simpler interface that
> can be shared with half a dozen other system exists, the portable
> application will use that and ignore better interfaces.

It's too "much effort"? Basically, what linux is asking is that cdrecord
stop wasting efforts trying to implement its own solution. Linux is
asking cdrecord to use SG_IO and leave device discovery and data transport
issues to the platform.

Linux doesn't even need -scanbus for example. You could compile out that
code. Device discovery is done by the platform - I find _scary_ that other
"modern" operative systems don't have a way of providing device discovery
services in 2006 and that a external app is needed but hey, people is free
to design their operative system as they like. Linux provides it and leaves
Schilling time to focus in other things. In my book, that's not "too much
effort", is "less effort". If someone bugs you because SG_IO doesn't work
just tell him to report the problem here - in fact cdrecord already has a
"friendly" warning about "linux-2.5 and newer". The cdrecord low level
scsi driver for SG_IO should be much simpler than all the others...

Albert Cahalan

unread,

Jan 25, 2006, 9:30:11 PM1/25/06

to

On 1/25/06, Joerg Schilling <schi...@fokus.fraunhofer.de> wrote:
> Albert Cahalan <acah...@gmail.com> wrote:

> > BTW, before Joerg mentions portability, I'd like to remind
> > everyone that all modern OSes support the use of normal device
> > names for SCSI. The most awkward is FreeBSD, where you have
> > to do a syscall or two to translate the name to Joerg's very
> > non-hotplug non-iSCSI way of thinking. Windows, MacOS X, and
> > even Solaris all manage to handle device names just fine. In
> > numerous cases, not just Linux, cdrecord is inventing crap out
> > of thin air to satisfy a pre-hotplug worldview.
>
> Looks like you are badly informed, so I encourage you to get yourself informed
> properly before sending your next postig....
>
> libscg includes 22 different SCSI low level transport implementations.
>
> - Only 5 of them allow a /dev/hd* device name related access.
>
> - 11 of them use file descriptors as handles for sending SCSI
> commands but do not have a name <-> fs relation and thus
> _need_ a SCSI device naming scheme as libscg offers.
> This is because there is no 1:1 relation between SCSI addressing
> and a fd retrieved from a /dev/* entry.
>
> - 6 of them not even allow to get a file descriptors as handles for
> sending SCSI commands. These platforms of course need the SCSI device
> naming scheme as libscg offers.
>
> Conclusion:
>
> 17 Platforms _need_ the addressing scheme libscg offers
>
> 5 Platforms _may_ use a different access method too.
>
> NOTE: Amongst the 6 plaforms that do not allow to even get a file descriptor
> there is a modern OS like MacOS X

You can't fool me, because I looked at the cdrecord source
code and at the documented APIs for various OSes.

It's misleading to say that MacOS doesn't allow a file
descriptor. MacOS has something similar to what Linux
has, but not in the normal filesystem namespace. You
specify a name to get a handle. Of course, on MacOS,
Joerg also uses -scanbus to create nonsense.

Names can be handled by Windows, FreeBSD, MacOS X,
Linux, OpenBSD, Solaris, HP-UX, AIX, and IRIX.
That's everything that isn't end-of-lifed.

The rest of your 22 platforms are dead and dying things
that are unlikely to be upgrading to the latest software or
hardware, assuming they survived the Y2K bug. It's old
stuff like the Amiga, the NeXT, etc.

Using numbers for CD burners is like trying to send email
to the IP address of the recipient, which half-way worked
until DHCP was invented. Wait, we could have all email
clients offer a -scannet option. :-)

Matthias Andree

unread,

Jan 26, 2006, 3:30:08 AM1/26/06

to

On Thu, 26 Jan 2006, gru...@teleline.es wrote:

> It's too "much effort"? Basically, what linux is asking is that cdrecord
> stop wasting efforts trying to implement its own solution. Linux is
> asking cdrecord to use SG_IO and leave device discovery and data transport
> issues to the platform.
>
> Linux doesn't even need -scanbus for example. You could compile out that
> code. Device discovery is done by the platform - I find _scary_ that other
> "modern" operative systems don't have a way of providing device discovery
> services in 2006 and that a external app is needed but hey, people is free
> to design their operative system as they like. Linux provides it and leaves
> Schilling time to focus in other things. In my book, that's not "too much
> effort", is "less effort". If someone bugs you because SG_IO doesn't work
> just tell him to report the problem here - in fact cdrecord already has a
> "friendly" warning about "linux-2.5 and newer". The cdrecord low level
> scsi driver for SG_IO should be much simpler than all the others...

Well, you need to implement 30 (or so) platform-specific ways to get a
list of devices, and portable applications aren't going to do that. To
make it explicit: no way. It is a maintenance nightmare, 30 lowly-tested
pieces of code, too.

This sounds like a huge difference, but I don't believe it actually is.
Jörg is trying to fight the system rather than stop complaining to users
about their using /dev/hd*. The scanning code is there and can be made
working with little effort probably.

--
Matthias Andree

Matthias Andree

unread,

Jan 26, 2006, 3:30:10 AM1/26/06

to

On Wed, 25 Jan 2006, Albert Cahalan wrote:

> It's misleading to say that MacOS doesn't allow a file
> descriptor. MacOS has something similar to what Linux
> has, but not in the normal filesystem namespace. You
> specify a name to get a handle. Of course, on MacOS,
> Joerg also uses -scanbus to create nonsense.

OK, so Jörg created this "nonsense", i. e. a triple of stupid numbers,
and claims he's using them to provide device lists to applications.

What prevents any of these GUIs from treating the "name" as an opaque
string? How would ATA:4,0,0 be different from 2,2,0 or /dev/hdc?

And how is the phantom GUI application obtaining the data? It needs to
scan all buses anyhow, and run -scanbus for each and every "transport
identifier" to get a grip of all devices, because cdrecord-with-libscg
is too stupid to do that by itself and unlist inferior (in its view, or
in the public view) identifiers (such as the PIO-only ATAPI:).

Here's an idea:

recognizing cdrecord may be portable, I wonder if it (or libscg for that
matter) is extensible or made decisions where it can decouple its
devices from the GUI application. Stating that the device ID no matter
how it looks today would be an opaque string not to be processed by the
GUI might be a first step to gain the necessary degrees of freedom to
change to ATA:/dev/hdc or just /dev/hdc (I don't mind which).

> Using numbers for CD burners is like trying to send email
> to the IP address of the recipient, which half-way worked
> until DHCP was invented. Wait, we could have all email
> clients offer a -scannet option. :-)

Well, in PeeCees, the BIOS presents that list of primary/secondary
master/slave, so there's /some/ point in it. Once hotplug comes into
play, it's all vain though.

(removing Lee from the Cc: list)

--
Matthias Andree

Joerg Schilling

unread,

Jan 26, 2006, 4:40:09 AM1/26/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> I think we'd better call the whole discussion off.

Let me come back to this again and give an important statement...

If this mailing list is not the place where to
make architectural design decisions, then we really better
should stop this discussion immediately as it then would be useless.

Please inform me about this fact in case you know more as I really
don't have time to waste with useless discussions.

It seems also required to give some background information:

Without Matthias, I would already never again answered any mail from LKML
as all previous experiences on this list have been a desaster. It did usually
take less than an hour until someone from the list did start personal attacks.
The last two times, the discussion has been made impossible because Jens Axboe
started with personal infringements and his obvious false claims.

This time, it did look really promising until Jens Axboe again started with
personal infringements. I have to admit that it would have been better to
ignore him from the very beginning, but I was in false hope that he could have
changed.

Let me make a proposal: I try to answer mail from people who send useful
contributions to the discussion and I will ignore anybody who starts with
personal infringements. I will try to reply to mails with incorrect claims if
they are not obvious but I will stop replying to the same person if he
continues with things that are incorrect.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Jens Axboe

unread,

Jan 26, 2006, 4:50:08 AM1/26/06

to

What is this, kindergarten? What false claims have I made? I pointed out
several you made, to which you had no rebuttal. Then you start playing
"Jens made obviously false claims", huh?? I've had more mature
conversations with my 1 year old son.

I'm sorry if you feel that me refuting your false statements are
personal attacks. Clearly not a problem that can be fixed at my end.
Ignoring facts and continuing to write the same wrong claims over and
over again doesn't make them true in the end.

Please take me off the cc list, thanks.

--
Jens Axboe

Lee Revell

unread,

Jan 26, 2006, 4:50:09 AM1/26/06

to

On Thu, 2006-01-26 at 10:38 +0100, Joerg Schilling wrote:
> <gru...@teleline.es> wrote:
>
> > El Wed, 25 Jan 2006 18:03:18 +0100,
> > Joerg Schilling <schi...@fokus.fraunhofer.de> escribió:
> >

> > > Guess why cdrecord -scanbus is needed.
> > >
> > > It serves the need of GUI programs for cdrercord and allows them to retrieve
> > > and list possible drives of interest in a platform independent way.
> >

> > But this is not neccesary at all, since linux platform already provides ways to
> > retrieve and list possible drives....
>
> Interesting: You claim that the Linux platform provides ways to retrieve
> the needed information on FreeBSD, MS-WIN, ....?
>

What do FreeBSD and MS-WIN have to do with Linux?

Lee

Joerg Schilling

unread,

Jan 26, 2006, 4:50:11 AM1/26/06

to

<gru...@teleline.es> wrote:

> El Wed, 25 Jan 2006 18:03:18 +0100,
> Joerg Schilling <schi...@fokus.fraunhofer.de> escribió:
>
> > Guess why cdrecord -scanbus is needed.
> >
> > It serves the need of GUI programs for cdrercord and allows them to retrieve
> > and list possible drives of interest in a platform independent way.
>
> But this is not neccesary at all, since linux platform already provides ways to
> retrieve and list possible drives....

Interesting: You claim that the Linux platform provides ways to retrieve
the needed information on FreeBSD, MS-WIN, ....?

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 26, 2006, 5:00:13 AM1/26/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> Joerg Schilling wrote:
>
> >> So I'll repeat my question: is there anything that SG_IO to /dev/hd* (via
> >> ide-cd) cannot do that it can do via /dev/sg*? Device enumeration doesn't count.
> >
> > But device enumeration is the central point when implementing -scanbus.
>
> Again: Is there anything *besides* (<German>: außer) device enumeration that
> does not work with the current /dev/hd* SG_IO interface?

This is the main point.

People like to run cdrecord -scanbus in order to find a list of usable devices.
People like to see all SCSI devices in a single name space as they are all
using the same protocol for communication.

A sane way to send SCSI commands to _any_ type of devices would be to have a
SCSI generic transport layer that is independent from the high-level features
of the OS and that is independent from whether there is a high-level driver for
this device at all.

This is what I designed the scg driver interface for in 1986 and this is what
Adaptec did in 1988 with ASPI. This is of course also why the SCSI standard
commitee made a proposal for the CAM SCSI interface.

http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf
http://www.t10.org/ftp/t10/drafts/cam3/cam3r03.pdf

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 26, 2006, 5:10:08 AM1/26/06

to

Kyle Moffett <mrmac...@mac.com> wrote:

> On Jan 25, 2006, at 12:14:15, Joerg Schilling wrote:
> > Incorrect, sorry. Do you really make Linux incompatible to the rest
> > of the world?

..

> >> 1 platform (Linux) _requires_ /dev/* access
> > Your last line is wrong
>
> No, it is correct. We require /dev/* access. The fact that we
> included /dev/sg* devices for /dev/[sh]d* was a mistake, and should
> be fixed, but those are still /dev/* access.

Looks like you missunderstood /dev/* here.

Even with /dev/scg* on Solaris or with CAM on FreeBSD, you open a device.
But this is not a /dev/ entry for a high level device like a disk, it is
a SCSI nexus device that allows you to send SCSI commands on any SCSI transport.

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Matthias Andree

unread,

Jan 26, 2006, 5:20:10 AM1/26/06

to

Jens Axboe schrieb am 2006-01-26:

> What is this, kindergarten? What false claims have I made? I pointed out
> several you made, to which you had no rebuttal. Then you start playing
> "Jens made obviously false claims", huh?? I've had more mature
> conversations with my 1 year old son.
>
> I'm sorry if you feel that me refuting your false statements are
> personal attacks. Clearly not a problem that can be fixed at my end.
> Ignoring facts and continuing to write the same wrong claims over and
> over again doesn't make them true in the end.

The problem appears to be that Jörg hasn't really looked at Linux in
some time, and he's used to systems that don't change architecture in
patchlevel releases, while Linux 2.6.N.M should actually have been
numbered 2.(6+2*N).M...

Jörg, any chance you might be arguing on basis of really old 2.6.X
kernels?

--
Matthias Andree

Joerg Schilling

unread,

Jan 26, 2006, 5:20:10 AM1/26/06

to

Matthias Andree <matthia...@gmx.de> wrote:

> Jens Axboe wrote:
>
> > In fact it would be a _lot_ easier to just scan sysfs and do an inquiry
> > on potentially useful devices.
>
> Hm. sysfs, procfs, udev, hotplug, netlink (for IPv6) - this all looks rather
> complicated and non-portable. I understand that applications that can just
> open every device and send SCSI INQUIRY might want to do that on Linux, too.

Another problem is that it is hard to find whether a new feature in Linux will
still be present some time later.

If I would try to immediately add support for every new feature, I would have a
lot of dead code in my sources and would need to put a lot of effort in this
kind of coding. It seems that it makes sense to wait untill all major Linux
distributions made a new feature their default for some time.....

Jörg

--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni)
schi...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

unread,

Jan 26, 2006, 5:30:15 AM1/26/06

to

Tomasz Torcz <zdz...@irc.pl> wrote:

> > > need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
> > > likely be using k3b or something graphical though, and just click on his
> > > Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
> > > help do this dynamically even.
> >
> > Guess why cdrecord -scanbus is needed.
> >
> > It serves the need of GUI programs for cdrercord and allows them to retrieve
> > and list possible drives of interest in a platform independent way.
>
> GUI programs tend to retrieve this kind of info form HAL
> (http://freedesktop.org/wiki/Software_2fhal)

I am not sure what you like to tell with this.

Programs that depend on specific Linux behavior tend to be non-portable (see
e.g. nautilus on GNOME). Nautilus tries to get e.g. the drive write speeds
by reading /prov/scsi/******. Besides the fact that this is not available
elsewhere, it gives incorrect results because there are a lot of DVD writers
with broken firmware.

Cdrecord implements workarounds for this kind of problems and for this reason,
the most portable solution for a GUI is to use cdrecord to retrieve the
information.

jerome lacoste

unread,

Jan 26, 2006, 5:40:26 AM1/26/06

to

On 1/26/06, Joerg Schilling <schi...@fokus.fraunhofer.de> wrote:
> Matthias Andree <matthia...@gmx.de> wrote:

[...]

> People like to run cdrecord -scanbus in order to find a list of usable devices.
> People like to see all SCSI devices in a single name space as they are all
> using the same protocol for communication.

If by people you mean developer, I might agree. If by people you mean
user, I disagree.

As a Linux user, the only reason I do cdrecord -scanbus is to comply
to the cdrecord way of doing likes. I don't personally like it.

I'd rather use /dev/cdrw, in a machine independent way, as in:

ssh user@host cdrecord dev=/dev/cdrw /path/to/file.iso

Jerome

Matthias Andree

unread,

Jan 26, 2006, 5:50:26 AM1/26/06

to

Joerg Schilling schrieb am 2006-01-26:

> Even with /dev/scg* on Solaris or with CAM on FreeBSD, you open a device.
> But this is not a /dev/ entry for a high level device like a disk, it is
> a SCSI nexus device that allows you to send SCSI commands on any SCSI
> transport.

As long as the device you open allows you to send SCSI commands on any
suitable (not just SCSI) transport, why bother?

--
Matthias Andree

Tomasz Torcz

unread,

Jan 26, 2006, 6:00:39 AM1/26/06

to

On Thu, Jan 26, 2006 at 11:25:49AM +0100, Joerg Schilling wrote:
> Tomasz Torcz <zdz...@irc.pl> wrote:
>
> > > > need to use /dev/hda, but /dev/cdrecorder or whatever. A real user would
> > > > likely be using k3b or something graphical though, and just click on his
> > > > Hitachi/Plextor/whatever burner. Perhaps some fancy udev rules could
> > > > help do this dynamically even.
> > >
> > > Guess why cdrecord -scanbus is needed.
> > >
> > > It serves the need of GUI programs for cdrercord and allows them to retrieve
> > > and list possible drives of interest in a platform independent way.
> >
> > GUI programs tend to retrieve this kind of info form HAL
> > (http://freedesktop.org/wiki/Software_2fhal)
>
> I am not sure what you like to tell with this.
>
> Programs that depend on specific Linux behavior tend to be non-portable (see
> e.g. nautilus on GNOME). Nautilus tries to get e.g. the drive write speeds
> by reading /prov/scsi/******. Besides the fact that this is not available
> elsewhere, it gives incorrect results because there are a lot of DVD writers
> with broken firmware.

This is a fallback if HAL isn't available. Normally this is done by:

drive->max_speed_write = libhal_device_get_property_int
(ctx, device_names [i],
"storage.cdrom.write_speed",
NULL)
/ CD_ROM_SPEED;

(natilus-burn-drive.c:1368 from version 2.12.0).

> Cdrecord implements workarounds for this kind of problems and for this reason,
> the most portable solution for a GUI is to use cdrecord to retrieve the
> information.

Yeah, sure.
/* FIXME we don't have any way to guess the real device
* from the info we get from CDRecord */

(the only FIXME in above mentioned file).

--
Tomasz Torcz 72->| 80->|
zdzichu@irc.-nie.spam-.pl 72->| 80->|

Rationale for RLIMIT_MEMLOCK?

Matthias Andree

Arjan van de Ven

Matthias Andree

Arjan van de Ven

Matthias Andree

Arjan van de Ven

Matthias Andree

Arjan van de Ven

Joerg Schilling

Lee Revell

Matthias Andree

Lee Revell

Matthias Andree

Lee Revell

Joerg Schilling

Lee Revell

Joerg Schilling

Joerg Schilling

Joerg Schilling

Lee Revell

Matthias Andree

Theodore Ts'o

Arjan van de Ven

Arjan van de Ven

Joerg Schilling

Joerg Schilling

Matthias Andree

Matthias Andree

Joerg Schilling

Jan Engelhardt

Matthias Andree

Jan Engelhardt

Theodore Ts'o

Matthias Andree

Matthias Andree

Edgar Toernig

Albert Cahalan

Albert Cahalan

Joerg Schilling

Jens Axboe

Jan Engelhardt

Jan Engelhardt

Jens Axboe

Jens Axboe

Jens Axboe

Jan Engelhardt

Joerg Schilling

Joerg Schilling

Jens Axboe

Joerg Schilling

Joerg Schilling

Matthias Andree

Joerg Schilling

Kyle Moffett

Matthias Andree

Joerg Schilling

Joerg Schilling

Joerg Schilling

Joerg Schilling

Jens Axboe

Matthias Andree

Matthias Andree

are added/removed - which

gru...@teleline.es

Joerg Schilling

Jens Axboe

Joerg Schilling

Jens Axboe

Jens Axboe

Matthias Andree

Jens Axboe

Kyle Moffett

Jens Axboe

Matthias Andree

Jens Axboe

Tomasz Torcz

Olivier Galibert

Lee Revell

jerome lacoste