Note that this patch does not actually enable the threaded probe for any
busses, as that's very dangerous at this point in time, without the
different bus authors trying it out and verifying that it does work
properly.
I did enable this for both USB and PCI and shaved .4 seconds off of the
boot time of my tiny little single processor laptop. The savings of my
4-way workstation is much greater, but things start to happen so fast we
miss the root disk, as init starts before the disks are finished being
initialized. I have some hacks to work around this right now, but I'll
hold off on posting them before I make sure they work properly (breaking
booting of people's machines isn't the best way to get them to accept
new features...)
Anyway, have fun playing around with this if you want, I'll be adding
this to the next -mm, but you will have to enable the bit on your own if
you want to see any speedups.
thanks,
greg k-h
------------
From: Greg Kroah-Hartman <gre...@suse.de>
Subject: Driver Core: add ability for drivers to do a threaded probe
This adds the infrastructure for drivers to do a threaded probe.
A new kernel thread will be created when the probe() function for the
driver is called, if the multithread_probe bit is set in the driver
saying it can support this kind of operation.
I have tested this with USB and PCI, and it works, and shaves off a lot
of time in the boot process, but there are issues with finding root boot
disks, and some USB drivers assume that this can never happen, so it is
currently not enabled for any bus type. Individual drivers can enable
this right now if they wish, and bus authors can selectivly turn it on
as well, once they determine that their subsystem will work properly
with it.
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
---
drivers/base/dd.c | 91 +++++++++++++++++++++++++++++++++----------------
include/linux/device.h | 2 +
2 files changed, 65 insertions(+), 28 deletions(-)
--- gregkh-2.6.orig/drivers/base/dd.c
+++ gregkh-2.6/drivers/base/dd.c
@@ -17,6 +17,7 @@
#include <linux/device.h>
#include <linux/module.h>
+#include <linux/kthread.h>
#include "base.h"
#include "power/power.h"
@@ -51,53 +52,41 @@ void device_bind_driver(struct device *
sysfs_create_link(&dev->kobj, &dev->driver->kobj, "driver");
}
-/**
- * driver_probe_device - attempt to bind device & driver.
- * @drv: driver.
- * @dev: device.
- *
- * First, we call the bus's match function, if one present, which
- * should compare the device IDs the driver supports with the
- * device IDs of the device. Note we don't do this ourselves
- * because we don't know the format of the ID structures, nor what
- * is to be considered a match and what is not.
- *
- * This function returns 1 if a match is found, an error if one
- * occurs (that is not -ENODEV or -ENXIO), and 0 otherwise.
- *
- * This function must be called with @dev->sem held. When called
- * for a USB interface, @dev->parent->sem must be held as well.
- */
-int driver_probe_device(struct device_driver * drv, struct device * dev)
+struct stupid_thread_structure {
+ struct device_driver *drv;
+ struct device *dev;
+};
+
+static int really_probe(void *void_data)
{
+ struct stupid_thread_structure *data = void_data;
+ struct device_driver *drv = data->drv;
+ struct device *dev = data->dev;
int ret = 0;
- if (drv->bus->match && !drv->bus->match(dev, drv))
- goto Done;
-
- pr_debug("%s: Matched Device %s with Driver %s\n",
- drv->bus->name, dev->bus_id, drv->name);
+ pr_debug("%s: Probing driver %s with device %s\n",
+ drv->bus->name, drv->name, dev->bus_id);
dev->driver = drv;
if (dev->bus->probe) {
ret = dev->bus->probe(dev);
if (ret) {
dev->driver = NULL;
- goto ProbeFailed;
+ goto probe_failed;
}
} else if (drv->probe) {
ret = drv->probe(dev);
if (ret) {
dev->driver = NULL;
- goto ProbeFailed;
+ goto probe_failed;
}
}
device_bind_driver(dev);
ret = 1;
pr_debug("%s: Bound Device %s to Driver %s\n",
drv->bus->name, dev->bus_id, drv->name);
- goto Done;
+ goto done;
- ProbeFailed:
+probe_failed:
if (ret == -ENODEV || ret == -ENXIO) {
/* Driver matched, but didn't support device
* or device not found.
@@ -110,7 +99,53 @@ int driver_probe_device(struct device_dr
"%s: probe of %s failed with error %d\n",
drv->name, dev->bus_id, ret);
}
- Done:
+done:
+ kfree(data);
+ return ret;
+}
+
+/**
+ * driver_probe_device - attempt to bind device & driver together
+ * @drv: driver to bind a device to
+ * @dev: device to try to bind to the driver
+ *
+ * First, we call the bus's match function, if one present, which should
+ * compare the device IDs the driver supports with the device IDs of the
+ * device. Note we don't do this ourselves because we don't know the
+ * format of the ID structures, nor what is to be considered a match and
+ * what is not.
+ *
+ * This function returns 1 if a match is found, an error if one occurs
+ * (that is not -ENODEV or -ENXIO), and 0 otherwise.
+ *
+ * This function must be called with @dev->sem held. When called for a
+ * USB interface, @dev->parent->sem must be held as well.
+ */
+int driver_probe_device(struct device_driver * drv, struct device * dev)
+{
+ struct stupid_thread_structure *data;
+ struct task_struct *probe_task;
+ int ret = 0;
+
+ if (drv->bus->match && !drv->bus->match(dev, drv))
+ goto done;
+
+ pr_debug("%s: Matched Device %s with Driver %s\n",
+ drv->bus->name, dev->bus_id, drv->name);
+
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ data->drv = drv;
+ data->dev = dev;
+
+ if (drv->multithread_probe) {
+ probe_task = kthread_run(really_probe, data,
+ "probe-%s", dev->bus_id);
+ if (IS_ERR(probe_task))
+ ret = PTR_ERR(probe_task);
+ } else
+ ret = really_probe(data);
+
+done:
return ret;
}
--- gregkh-2.6.orig/include/linux/device.h
+++ gregkh-2.6/include/linux/device.h
@@ -105,6 +105,8 @@ struct device_driver {
void (*shutdown) (struct device * dev);
int (*suspend) (struct device * dev, pm_message_t state);
int (*resume) (struct device * dev);
+
+ unsigned int multithread_probe:1;
};
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> - goto ProbeFailed;
> + goto probe_failed;
> }
> } else if (drv->probe) {
> ret = drv->probe(dev);
> if (ret) {
> dev->driver = NULL;
> - goto ProbeFailed;
> + goto probe_failed;
> }
> }
> device_bind_driver(dev);
> ret = 1;
> pr_debug("%s: Bound Device %s to Driver %s\n",
> drv->bus->name, dev->bus_id, drv->name);
> - goto Done;
> + goto done;
>
> - ProbeFailed:
> +probe_failed:
> if (ret == -ENODEV || ret == -ENXIO) {
> /* Driver matched, but didn't support device
> * or device not found.
> @@ -110,7 +99,53 @@ int driver_probe_device(struct device_dr
> "%s: probe of %s failed with error %d\n",
> drv->name, dev->bus_id, ret);
> }
> - Done:
> +done:
Removing these changes will make this patch smaller and do one thing. ;-)
This is very interesting in the context of a few discussions I had at
OLS about klibc; there are people who would like initramfs to be
accessible *before* device probing is done, so that drivers can access
firmware and possible hotplug from the initramfs during the driver
initialization. We could even invoke (k)init at this point; this would
require a) having a system call or device that would allow kinit to
block until device probing was complete, and b) a way to handle
/dev/console -- there are several different ways to deal with it; it's
mostly a matter of picking a good one.
Note that we don't need device drivers for userspace -- we only need VM,
VFS and scheduler initialization.
Multithreaded device initialization is a great idea, especially since
many devices require delays during initialization, sometimes on the
order of seconds.
-hpa
Heh, yeah, it would, but cleaning up coding style violations at the same
time that I move code around is usually safe :)
thanks,
greg k-h
Another option would be to have probing still serialized within a bus but
serviced by a separate thread. The thread can die after let's say 1 minute
inactivity timeout and respawned if needed.
--
Dmitry
What happens about the logging?
Surely one would want the output from one probe to be output into the
log as a block, and not mix the output from multiple simultaneous probes.
James
Also I think providing cmdline option to override the default
multithread probe behaviour would be good. Something like above
which is useful while debugging the boot issues.
Cheers ,
Anil S Keshavamurthy
Use single-line printks were possible, or mutex-protected multiline
blocks where you really can't do without multiple lines of printks that
really cannot be separated. (Don't perform time consuming functions
within those mutexes; that would defeat the multithreaded probing...)
To adjust printks is only the beginning of what is to be done to adapt
single-threaded bus probes to multithreaded ones. There may be hidden
assumptions that rely on single-threaded execution.
--
Stefan Richter
-=====-=-==- -=== ==-=-
http://arcgraph.de/sr/
Just FYI:
1.) SCSI:
There is a patch circulating at linux-scsi which adds parallelized bus
scanning to the SCSI subsystem. I believe this cannot be built upon
parallelization by driver core. But I am not too familiar with the
subsystem facilities which this patch expands on. The patch is from
Matthew Wilcox, titled "Asynchronous target discovery".
http://marc.theaimsgroup.com/?t=115349750400001
2.) IEEE 1394:
There was brief preliminary discussion of parallelized probing for the
ieee1394 subsystem at linux1394-devel. Using driver core's
parallelization would achieve about 1/3rd of what would be desirable.
Background: After each bus reset, the 1394 core (nodemgr) has to
download part or all of the configuration ROM of attached devices to
determine their identity and capabilities. After that, either a protocol
driver probe (generic device hook), a protocol driver remove or suspend
routine (generic device hook), or a protocol driver update routine
(extra 1394 subsystem hook) is executed; depending on whether nodes were
added, removed, or in-use nodes were rediscovered. --- I.e. we better
have these subthreads provided by ieee1394/nodemgr itself.
--
Stefan Richter
-=====-=-==- -=== ==-=-
http://arcgraph.de/sr/
As this is going to be a bus specific option, one would think that the
individual busses would provide such a switch, if they wanted to or not.
thanks,
greg k-h
Nothing, it works just fine. A little messy perhaps, but it's all
there. I don't see the problem here...
> Surely one would want the output from one probe to be output into the
> log as a block, and not mix the output from multiple simultaneous probes.
Why not? Each subsystem already uses the dev_printk() for the most part
for their logging messages, so it's easy to figure out what is going on.
thanks,
greg k-h
Yes, you can do that right now if you wish, no need to mess with the
driver core. But for busses that don't want to do something like that
(like USB and PCI probably will not), this option is now available.
thanks,
greg k-h
Yeah, some drivers really don't like it, the ata_piix driver for example
had to be changed to keep it from thinking it was really being hotpluged
instead of the initial probe sequence. Odds are there are lots of other
driver specific issues like this everywhere. That's why it's a driver
specific flag, when the authors of the driver say it's ready, then it
will be enabled for that driver.
thanks,
greg k-h
I don't know enough about SCSI to say if this driver core patch will
help them out or not. At first glance it does, but the device order
gets all messed up from what users are traditionally used to, so perhaps
the scsi core will just have to stick with their own changes.
> 2.) IEEE 1394:
> There was brief preliminary discussion of parallelized probing for the
> ieee1394 subsystem at linux1394-devel. Using driver core's
> parallelization would achieve about 1/3rd of what would be desirable.
> Background: After each bus reset, the 1394 core (nodemgr) has to
> download part or all of the configuration ROM of attached devices to
> determine their identity and capabilities. After that, either a protocol
> driver probe (generic device hook), a protocol driver remove or suspend
> routine (generic device hook), or a protocol driver update routine
> (extra 1394 subsystem hook) is executed; depending on whether nodes were
> added, removed, or in-use nodes were rediscovered. --- I.e. we better
> have these subthreads provided by ieee1394/nodemgr itself.
That's fine, nothing in this patch precludes you from doing that at all.
I'm not thinking that all busses will enable this, rather that some
will, and some will rely on their bus code to do threaded stuff if it
can.
thanks,
greg k-h
Right. Networking is in the same boat ... unless they're using udev
or some other tool which renames network interfaces. I'm not entirely
comfortable with the kernel forcing you to use some other tool in order
to maintain stable device names on a static setup. Perhaps we need
either a CONFIG option or a boot option to decide whether to do parallel
pci probes.
I still think we need a method of renaming block devices, but haven't
looked into it in enough detail yet.
I agree.
However, almost all distros now use persistant names for network devices
due to the PCI Hotplug issue, so it isn't probably as bad as you might
think.
> Perhaps we need either a CONFIG option or a boot option to decide
> whether to do parallel pci probes.
Oh yeah, it will be probably both of them :)
> I still think we need a method of renaming block devices, but haven't
> looked into it in enough detail yet.
That could get "interesting"...
But now that we all are using /dev/disk/ and it has persistant device
names for block devices, I really don't think it's that big of a deal.
thanks,
greg k-h
Oh, for people using a distro, I'm sure it's no problem at all. It's
the homebrew people I'm worried about ;-)
> > I still think we need a method of renaming block devices, but haven't
> > looked into it in enough detail yet.
>
> That could get "interesting"...
>
> But now that we all are using /dev/disk/ and it has persistant device
> names for block devices, I really don't think it's that big of a deal.
Actually, that's exactly why it's a big deal. The kernel spits out
messages like:
printk(KERN_DEBUG "%s: Mode Sense: %02x %02x %02x %02x\n",
diskname, buffer[0], buffer[1], buffer[2], buffer[3]);
where diskname is something like sda. Now the user has to figure out
what sda means in terms of /dev/disk/ and in terms of scsi h:c:t:l and
in terms of which sticky label is on which drive. If we let userspace
change the gendev's disk_name, that printk can be meaningful to the user
in at least one of those senses.
No, this comes up all the time. Userspace has at least 3 different
mappings to /dev/sda in /dev/disk right now. Which one do you want the
kernel to use?:
$ tree /dev/disk/ | grep sda1
| |-- scsi-SATA_Maxtor_7L250S0_L59FRPQH_L59FRPQH-part1 -> ../../sda1
| |-- boot -> ../../sda1
`-- 9c0ef40c-6de9-46f6-ac79-32296c667cf1 -> ../../sda1
Userspace should be doing the reverse mapping if it wants to, the kernel
should not care about this at all.
thanks,
greg k-h
Why use a bit field here? It ends up consuming sizeof(long) anyway
and causes more complex code, with no obvious benefit.
Arnd <><
Because we don't yet have a boolean type :)
Honestly, I don't really care, I can make it a char if people really
care (but due to padding, it would take up the same size as unsigned
long anyway...)
thanks,
greg k-h