[PATCH 22/71] ncr5380: Eliminate selecting state

Finn Thain

unread,

Nov 18, 2015, 4:07:01 AM11/18/15

to James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

ncr5380-eliminate-selecting-state

Ondrej Zary

unread,

Nov 24, 2015, 2:08:55 PM11/24/15

to Finn Thain, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Wednesday 18 November 2015 09:35:17 Finn Thain wrote:
> Linux v2.1.105 changed the algorithm for polling for the BSY signal
> in NCR5380_select() and NCR5380_main().
>
> Presently, this code has a bug. Back then, NCR5380_set_timer(hostdata, 1)
> meant reschedule main() after sleeping for 10 ms. Repeated 25 times this
> provided the recommended 250 ms selection time-out delay. This got broken
> when HZ became configurable.
>
> We could fix this but there's no need to reschedule the main loop. This
> BSY polling presently happens when the NCR5380_main() work queue item
> calls NCR5380_select(), which in turn schedules NCR5380_main(), which
> calls NCR5380_select() again, and so on.
>
> This algorithm is a deviation from the simpler one in atari_NCR5380.c.
> The extra complexity and state is pointless. There's no reason to
> stop selection half-way and return to to the main loop when the main
> loop can do nothing useful until selection completes.
>
> So just poll for BSY. We can sleep while polling now that we have a
> suitable workqueue.

Bisecting slow module initialization pointed to this commit.

Before this commit (2 seconds):
[ 60.317374] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
[ 60.780715] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
[ 62.606260] sd 2:0:1:0: Attached scsi generic sg1 type 0

After this commit (22 seconds):
[ 137.511711] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
[ 145.028532] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 145.029767] clocksource: 'acpi_pm' wd_now: a49738 wd_last: f4da04 mask: ffffff
[ 145.029828] clocksource: 'tsc' cs_now: 2ea624698e cs_last: 2c710aa17f mask: ffffffffffffffff
[ 145.032733] clocksource: Switched to clocksource acpi_pm
[ 145.236951] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
[ 159.959308] sd 2:0:1:0: Attached scsi generic sg1 type 0

--
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Finn Thain

unread,

Nov 24, 2015, 10:18:04 PM11/24/15

to Ondrej Zary, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Tue, 24 Nov 2015, Ondrej Zary wrote:

> On Wednesday 18 November 2015 09:35:17 Finn Thain wrote:
> > Linux v2.1.105 changed the algorithm for polling for the BSY signal
> > in NCR5380_select() and NCR5380_main().
> >
> > Presently, this code has a bug. Back then, NCR5380_set_timer(hostdata, 1)
> > meant reschedule main() after sleeping for 10 ms. Repeated 25 times this
> > provided the recommended 250 ms selection time-out delay. This got broken
> > when HZ became configurable.
> >
> > We could fix this but there's no need to reschedule the main loop. This
> > BSY polling presently happens when the NCR5380_main() work queue item
> > calls NCR5380_select(), which in turn schedules NCR5380_main(), which
> > calls NCR5380_select() again, and so on.
> >
> > This algorithm is a deviation from the simpler one in atari_NCR5380.c.
> > The extra complexity and state is pointless. There's no reason to
> > stop selection half-way and return to to the main loop when the main
> > loop can do nothing useful until selection completes.
> >
> > So just poll for BSY. We can sleep while polling now that we have a
> > suitable workqueue.
>
> Bisecting slow module initialization pointed to this commit.

That's disappointing. This patch removed some nasty code. Anyway, thanks
for taking the trouble to bisect.

>
> Before this commit (2 seconds):
> [ 60.317374] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> [ 60.780715] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
> [ 62.606260] sd 2:0:1:0: Attached scsi generic sg1 type 0
>
>
> After this commit (22 seconds):
> [ 137.511711] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> [ 145.028532] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> [ 145.029767] clocksource: 'acpi_pm' wd_now: a49738 wd_last: f4da04 mask: ffffff
> [ 145.029828] clocksource: 'tsc' cs_now: 2ea624698e cs_last: 2c710aa17f mask: ffffffffffffffff
> [ 145.032733] clocksource: Switched to clocksource acpi_pm

I figured that it was okay to sleep from an unbound CPU-intensive
workqueue but doing so seems to cause problems. (See also patch 66/71
"Fix soft lockups".)

Perhaps a kthread is needed instead of a workqueue? (This workqueue
already has it's own kthread, but top shows that it doesn't accrue CPU
time.)

> [ 145.236951] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
> [ 159.959308] sd 2:0:1:0: Attached scsi generic sg1 type 0
>
>

This problem doesn't show up on my hardware, and I'd like to know where
those 22 seconds are being spent. Would you please apply the entire series
and add,
#define NDEBUG (NDEBUG_ARBITRATION | NDEBUG_SELECTION | NDEBUG_MAIN)
to the top of g_NCR5380.c and send me the messages logged during modprobe?

--

Ondrej Zary

unread,

Nov 25, 2015, 4:59:54 PM11/25/15

to Finn Thain, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

[ 397.014581] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
[ 412.099695] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 412.103625] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 412.110503] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
[ 412.110892] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 412.114154] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 412.119733] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 427.198108] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 442.276586] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 457.354592] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 472.432999] STATUS_REG: 00
BASR: 08
ICR: 00
MODE: 00
[ 487.513027] sd 2:0:1:0: Attached scsi generic sg1 type 0

--
Ondrej Zary

Finn Thain

unread,

Nov 26, 2015, 5:24:06 AM11/26/15

to Ondrej Zary, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Wed, 25 Nov 2015, Ondrej Zary wrote:

> On Wednesday 25 November 2015 04:17:09 Finn Thain wrote:
> >
> > On Tue, 24 Nov 2015, Ondrej Zary wrote:
> >
> > >
> > > Bisecting slow module initialization pointed to this commit.
> > >

> > > Before this commit (2 seconds):
> > > [ 60.317374] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> > > [ 60.780715] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
> > > [ 62.606260] sd 2:0:1:0: Attached scsi generic sg1 type 0
> > >
> > >
> > > After this commit (22 seconds):
> > > [ 137.511711] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NCR53C400 }, USLEEP_POLL 3, USLEEP_SLEEP 50, options { AUTOPROBE_IRQ PSEUDO_DMA }
> > > [ 145.028532] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large:
> > > [ 145.029767] clocksource: 'acpi_pm' wd_now: a49738 wd_last: f4da04 mask: ffffff
> > > [ 145.029828] clocksource: 'tsc' cs_now: 2ea624698e cs_last: 2c710aa17f mask: ffffffffffffffff
> > > [ 145.032733] clocksource: Switched to clocksource acpi_pm

I see that the delay increased from 22 seconds to 90 seconds. Perhaps that
is because the remaining patches were applied.

Anyway, it seems to confirm that this delay comes from scanning the six
unused SCSI bus IDs. Each one takes 15 seconds. Could be a five second
timeout and three retries. It's hard to tell because the dsprintk()
logging didn't show up. Can you try modprobe after,
# echo 1 > /sys/module/printk/parameters/ignore_loglevel
And maybe also,
# scsi_logging_level -sS7

It seems like the 53C400 chip doesn't arbitrate like a 53C80, but that
doesn't make sense given it contains a 53C80 logic block. Perhaps some
extra initialization is needed. Which card(s) are affected?

Aside from trying to figure out why the chip behaves oddly, we could also
try to figure out whether the driver and its main loop is causing problems
for other tasks i.e. timekeeping watchdog. Does this patch make any
difference?

Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c 2015-11-26 15:42:01.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c 2015-11-26 15:42:04.000000000 +1100
@@ -925,6 +925,7 @@ static void NCR5380_main(struct work_str
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
struct scsi_cmnd *cmd;
+ unsigned long deadline = jiffies + msecs_to_jiffies(10);
int done;

/*
@@ -984,9 +985,12 @@ static void NCR5380_main(struct work_str
done = 0;
}
spin_unlock_irq(&hostdata->lock);
- if (!done)
- cond_resched();
- } while (!done);
+ if (done || time_is_before_jiffies(deadline))
+ break;
+ cond_resched();
+ } while (1);
+ if (!done)
+ queue_work(hostdata->work_q, &hostdata->main_task);
}

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c 2015-11-26 15:42:01.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c 2015-11-26 15:42:04.000000000 +1100
@@ -817,6 +817,7 @@ static void NCR5380_main(struct work_str
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
struct scsi_cmnd *cmd;
+ unsigned long deadline = jiffies + msecs_to_jiffies(10);
int done;

do {
@@ -860,9 +861,12 @@ static void NCR5380_main(struct work_str
done = 0;
}
spin_unlock_irq(&hostdata->lock);
- if (!done)
- cond_resched();
- } while (!done);
+ if (done || time_is_before_jiffies(deadline))
+ break;
+ cond_resched();
+ } while (1);
+ if (!done)
+ queue_work(hostdata->work_q, &hostdata->main_task);
}

#ifndef DONT_USE_INTR

Ondrej Zary

unread,

Nov 26, 2015, 2:36:04 PM11/26/15

to Finn Thain, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

The remaining patches were not applied. The delay varies, probably depending
on the state of the external drive.

> Anyway, it seems to confirm that this delay comes from scanning the six
> unused SCSI bus IDs. Each one takes 15 seconds. Could be a five second
> timeout and three retries. It's hard to tell because the dsprintk()
> logging didn't show up. Can you try modprobe after,
> # echo 1 > /sys/module/printk/parameters/ignore_loglevel
> And maybe also,
> # scsi_logging_level -sS7

[ 156.014234] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
[ 156.020369] scsi host2: scsi_scan_host_selected: <4294967295:4294967295:18446744073709551615>
[ 156.020777] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
[ 156.022046] scsi host2: main: dequeued de045600
[ 156.022109] STATUS_REG: 00
BASR: 00
ICR: 00
MODE: 00
[ 156.022451] scsi host2: starting arbitration, id = 7
[ 156.022740] scsi host2: won arbitration
[ 156.022806] scsi host2: selecting target 0
[ 171.097450] scsi host2: target did not respond within 250ms
[ 171.097508] scsi host2: main: select complete
[ 171.099554] scsi 2:0:0:0: scsi scan: INQUIRY failed with code 0x40000
[ 171.103200] scsi 2:0:1:0: scsi scan: INQUIRY pass 1 length 36
[ 171.105246] scsi host2: main: dequeued df6b9f00
[ 171.105350] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 171.105664] scsi host2: starting arbitration, id = 7
[ 171.105752] scsi host2: won arbitration
[ 171.105822] scsi host2: selecting target 1
[ 171.105977] scsi host2: target 1 selected, going into MESSAGE OUT phase.
[ 171.106042] scsi host2: nexus established.
[ 171.106093] scsi host2: main: select complete
[ 171.106145] scsi host2: main: performing information transfer
[ 171.108531] scsi 2:0:1:0: scsi scan: INQUIRY successful with code 0x0
[ 171.108639] scsi 2:0:1:0: scsi scan: INQUIRY pass 2 length 130
[ 171.109401] scsi host2: main: dequeued df6b9f00
[ 171.109485] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 171.109779] scsi host2: starting arbitration, id = 7
[ 171.109878] scsi host2: won arbitration
[ 171.109954] scsi host2: selecting target 1
[ 171.110112] scsi host2: target 1 selected, going into MESSAGE OUT phase.
[ 171.110181] scsi host2: nexus established.
[ 171.110236] scsi host2: main: select complete
[ 171.110286] scsi host2: main: performing information transfer
[ 171.114811] scsi 2:0:1:0: scsi scan: INQUIRY successful with code 0x0
[ 171.114928] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
[ 171.115011] scsi target2:0:1: scsi scan: Sequential scan
[ 171.115856] scsi 2:0:1:1: scsi scan: INQUIRY pass 1 length 36
[ 171.116008] scsi host2: main: dequeued df6b9800
[ 171.116039] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 171.116736] scsi host2: starting arbitration, id = 7
[ 171.116843] scsi host2: won arbitration
[ 171.116898] scsi host2: selecting target 1
[ 171.117117] scsi host2: target 1 selected, going into MESSAGE OUT phase.
[ 171.117182] scsi host2: nexus established.
[ 171.117234] scsi host2: main: select complete
[ 171.117287] scsi host2: main: performing information transfer
[ 171.119674] scsi 2:0:1:1: scsi scan: INQUIRY successful with code 0x0
[ 171.119781] scsi 2:0:1:1: scsi scan: INQUIRY pass 2 length 130
[ 171.119898] scsi host2: main: dequeued df6b9800
[ 171.119962] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 171.121426] scsi host2: starting arbitration, id = 7
[ 171.121537] scsi host2: won arbitration
[ 171.121592] scsi host2: selecting target 1
[ 171.121766] scsi host2: target 1 selected, going into MESSAGE OUT phase.
[ 171.121834] scsi host2: nexus established.
[ 171.121889] scsi host2: main: select complete
[ 171.121940] scsi host2: main: performing information transfer
[ 171.126361] scsi 2:0:1:1: scsi scan: INQUIRY successful with code 0x0
[ 171.126470] scsi 2:0:1:1: scsi scan: peripheral qualifier of 3, device not added
[ 171.127735] scsi 2:0:2:0: scsi scan: INQUIRY pass 1 length 36
[ 171.127920] scsi host2: main: dequeued df6b9800
[ 171.127982] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 171.128648] scsi host2: starting arbitration, id = 7
[ 171.128754] scsi host2: won arbitration
[ 171.128811] scsi host2: selecting target 2
[ 186.203734] scsi host2: target did not respond within 250ms
[ 186.203794] scsi host2: main: select complete
[ 186.205355] scsi 2:0:2:0: scsi scan: INQUIRY failed with code 0x40000
[ 186.208989] scsi 2:0:3:0: scsi scan: INQUIRY pass 1 length 36
[ 186.209178] scsi host2: main: dequeued df6b9f00
[ 186.209242] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 186.209570] scsi host2: starting arbitration, id = 7
[ 186.209660] scsi host2: won arbitration
[ 186.209731] scsi host2: selecting target 3
[ 201.284666] scsi host2: target did not respond within 250ms
[ 201.284726] scsi host2: main: select complete
[ 201.286193] scsi 2:0:3:0: scsi scan: INQUIRY failed with code 0x40000
[ 201.287455] scsi 2:0:4:0: scsi scan: INQUIRY pass 1 length 36
[ 201.287609] scsi host2: main: dequeued df6b9900
[ 201.287673] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 201.287973] scsi host2: starting arbitration, id = 7
[ 201.288766] scsi host2: won arbitration
[ 201.288855] scsi host2: selecting target 4
[ 216.363793] scsi host2: target did not respond within 250ms
[ 216.363853] scsi host2: main: select complete
[ 216.363947] scsi 2:0:4:0: scsi scan: INQUIRY failed with code 0x40000
[ 216.366469] scsi 2:0:5:0: scsi scan: INQUIRY pass 1 length 36
[ 216.366633] scsi host2: main: dequeued df6b9f00
[ 216.366696] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 216.366997] scsi host2: starting arbitration, id = 7
[ 216.367153] scsi host2: won arbitration
[ 216.367208] scsi host2: selecting target 5
[ 231.442079] scsi host2: target did not respond within 250ms
[ 231.442136] scsi host2: main: select complete
[ 231.442214] scsi 2:0:5:0: scsi scan: INQUIRY failed with code 0x40000
[ 231.445050] scsi 2:0:6:0: scsi scan: INQUIRY pass 1 length 36
[ 231.445329] scsi host2: main: dequeued df6b9c00
[ 231.445392] STATUS_REG: 00

BASR: 08
ICR: 00
MODE: 00

[ 231.445682] scsi host2: starting arbitration, id = 7
[ 231.445784] scsi host2: won arbitration
[ 231.445855] scsi host2: selecting target 6
[ 246.521110] scsi host2: target did not respond within 250ms
[ 246.521171] scsi host2: main: select complete
[ 246.521253] scsi 2:0:6:0: scsi scan: INQUIRY failed with code 0x40000
[ 246.525098] sd 2:0:1:0: Attached scsi generic sg1 type 0

> It seems like the 53C400 chip doesn't arbitrate like a 53C80, but that
> doesn't make sense given it contains a 53C80 logic block. Perhaps some
> extra initialization is needed. Which card(s) are affected?

All of them: HP C2502 (53C400A), Canon FG2-5202 (53C400) and DTCT-436P.

> Aside from trying to figure out why the chip behaves oddly, we could also
> try to figure out whether the driver and its main loop is causing problems
> for other tasks i.e. timekeeping watchdog. Does this patch make any
> difference?

The timekeeping warning does not appear when all 71 patches are applied.
Reverse-bisected it - the warning disappears after:
[PATCH 50/71] ncr5380: Change instance->host_lock to hostdata->lock

--
Ondrej Zary

Finn Thain

unread,

Nov 26, 2015, 5:33:40 PM11/26/15

to Ondrej Zary, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Thu, 26 Nov 2015, Ondrej Zary wrote:

>
> [ 156.014234] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
> [ 156.020369] scsi host2: scsi_scan_host_selected: <4294967295:4294967295:18446744073709551615>
> [ 156.020777] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
> [ 156.022046] scsi host2: main: dequeued de045600
> [ 156.022109] STATUS_REG: 00
> BASR: 00
> ICR: 00
> MODE: 00
> [ 156.022451] scsi host2: starting arbitration, id = 7
> [ 156.022740] scsi host2: won arbitration
> [ 156.022806] scsi host2: selecting target 0
> [ 171.097450] scsi host2: target did not respond within 250ms
> [ 171.097508] scsi host2: main: select complete
> [ 171.099554] scsi 2:0:0:0: scsi scan: INQUIRY failed with code 0x40000

This is very helpful, thanks.

The problem has to be this:

err = NCR5380_poll_politely(instance, STATUS_REG, SR_BSY, SR_BSY,
msecs_to_jiffies(250));

which would suggest a problem with the busy wait loop. It seems that 5380
register accesses are much more expensive on PC than the platforms I've
tested.

> > [...] Which card(s) are affected?

>
> All of them: HP C2502 (53C400A), Canon FG2-5202 (53C400) and DTCT-436P.
>

OK.

> > Aside from trying to figure out why the chip behaves oddly, we could also
> > try to figure out whether the driver and its main loop is causing problems
> > for other tasks i.e. timekeeping watchdog. Does this patch make any
> > difference?
>
> The timekeeping warning does not appear when all 71 patches are applied.
> Reverse-bisected it - the warning disappears after:
> [PATCH 50/71] ncr5380: Change instance->host_lock to hostdata->lock
>

Makes sense. I think that this should solve the problem:

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c 2015-11-27 09:21:40.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c 2015-11-27 09:25:36.000000000 +1100
@@ -230,7 +230,7 @@ static int NCR5380_poll_politely2(struct
unsigned long n;

/* Busy-wait for up to 1 jiffy */
- n = loops_per_jiffy;
+ n = 1 + loops_per_jiffy / 10;
do {
if ((NCR5380_read(reg1) & bit1) == val1)
return 0;

--

Ondrej Zary

unread,

Nov 28, 2015, 6:13:25 PM11/28/15

to Finn Thain, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

Yes, it does!

[ 48.119367] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
[ 49.715388] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
[ 57.653737] sd 2:0:1:0: Attached scsi generic sg1 type 0
[ 57.672524] sd 2:0:1:0: [sdb] 479350 512-byte logical blocks: (245 MB/234 MiB)
[ 57.675632] sd 2:0:1:0: [sdb] Write Protect is off
[ 57.679534] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 57.724617] sdb: sdb1
[ 57.741620] sd 2:0:1:0: [sdb] Attached SCSI disk

--
Ondrej Zary

Finn Thain

unread,

Nov 28, 2015, 7:47:08 PM11/28/15

to Ondrej Zary, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Sun, 29 Nov 2015, Ondrej Zary wrote:

> > [...] I think that this should solve the problem:

>
> Yes, it does!
>
> [ 48.119367] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
> [ 49.715388] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS

That still takes about 1.6 seconds to scan a vacant bus ID. It should be
more like 0.25 seconds. Did you have the entire patch series applied?

The code presently in mainline spins for 500 iterations in the relevant
busy-wait loop, with the comment, "8uS a cycle for the cpu access". That
doesn't help me much. Does that refer to an ISA bus cycle? Or one
iteration of the loop? Electrical conductance?

If we knew how many loops_per_jiffy your machine gets, and your CONFIG_HZ
value, we could figure out how long a chip register access takes (which
might be 8 us).

I'll try to figure out a similar timing for my Domex PCI card to see if
there is some sort of compromise to be found.

--

Ondrej Zary

unread,

Nov 29, 2015, 4:10:31 AM11/29/15

to Finn Thain, James E.J. Bottomley, Michael Schmitz, linux...@vger.kernel.org, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Sunday 29 November 2015 01:46:03 Finn Thain wrote:
>
> On Sun, 29 Nov 2015, Ondrej Zary wrote:
>
> > > [...] I think that this should solve the problem:
> >
> > Yes, it does!
> >
> > [ 48.119367] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x0, n_io_port 0, base 0xd8000, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, this_id 7, flags { NO_DMA_FIXUP }, options { AUTOPROBE_IRQ PSEUDO_DMA }
> > [ 49.715388] scsi 2:0:1:0: Direct-Access QUANTUM LP240S GM240S01X 4.6 PQ: 0 ANSI: 2 CCS
>
> That still takes about 1.6 seconds to scan a vacant bus ID. It should be
> more like 0.25 seconds. Did you have the entire patch series applied?
>
> The code presently in mainline spins for 500 iterations in the relevant
> busy-wait loop, with the comment, "8uS a cycle for the cpu access". That
> doesn't help me much. Does that refer to an ISA bus cycle? Or one
> iteration of the loop? Electrical conductance?
>
> If we knew how many loops_per_jiffy your machine gets, and your CONFIG_HZ
> value, we could figure out how long a chip register access takes (which
> might be 8 us).

loops_per_jiffy=4797252
CONFIG_HZ=250

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Celeron(TM) CPU 1200MHz
stepping : 1
microcode : 0x1c
cpu MHz : 1199.313
cache size : 256 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pse36 mmx fxsr sse
bugs :
bogomips : 2398.62
clflush size : 32
cache_alignment : 32
address sizes : 36 bits physical, 32 bits virtual
power management:

# cat iotest.c
#include <stdio.h>
#include <sys/io.h>

int main(void) {
if (iopl(3)) {
perror("iopl");
return 1;
}
for (int i = 0; i < 10000000; i++)
inb(0x280);
return 0;
}

# cc --std=c99 iotest.c -o iotest
# time ./iotest

real 0m16.938s
user 0m16.932s
sys 0m0.000s

outb() instead of inb():
# time ./iotest

real 0m17.210s
user 0m17.204s
sys 0m0.000s

> I'll try to figure out a similar timing for my Domex PCI card to see if
> there is some sort of compromise to be found.

--

Ondrej Zary

Geert Uytterhoeven

unread,

Nov 29, 2015, 4:28:28 AM11/29/15

to Finn Thain, Ondrej Zary, James E.J. Bottomley, Michael Schmitz, Linux/m68k, scsi, linux-...@vger.kernel.org

Hi Finn,

On Thu, Nov 26, 2015 at 11:32 PM, Finn Thain <fth...@telegraphics.com.au> wrote:
>> The timekeeping warning does not appear when all 71 patches are applied.
>> Reverse-bisected it - the warning disappears after:
>> [PATCH 50/71] ncr5380: Change instance->host_lock to hostdata->lock
>>
>
> Makes sense. I think that this should solve the problem:
>
> Index: linux/drivers/scsi/NCR5380.c
> ===================================================================
> --- linux.orig/drivers/scsi/NCR5380.c 2015-11-27 09:21:40.000000000 +1100
> +++ linux/drivers/scsi/NCR5380.c 2015-11-27 09:25:36.000000000 +1100
> @@ -230,7 +230,7 @@ static int NCR5380_poll_politely2(struct
> unsigned long n;
>
> /* Busy-wait for up to 1 jiffy */
> - n = loops_per_jiffy;
> + n = 1 + loops_per_jiffy / 10;
> do {
> if ((NCR5380_read(reg1) & bit1) == val1)
> return 0;

This still heavily depends on the processing time spent in NCR5380_read().
You should never use a value derived from loops_per_jiffy for a non-empty loop,
as it may take much longer. Always compare with an maximum end time instead.

E.g.:

end = jiffies + 2; /* 1 jiffie + 1 safeguard */

do {
if ((NCR5380_read(reg1) & bit1) == val1)
return 0;

cpu_relax();
} while (time_before(jiffies, end);

And a similar loop for "Busy-wait for up to 20 ms".

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Finn Thain

unread,

Nov 29, 2015, 5:26:55 AM11/29/15

to Geert Uytterhoeven, Ondrej Zary, James E.J. Bottomley, Michael Schmitz, Linux/m68k, scsi, linux-...@vger.kernel.org

On Sun, 29 Nov 2015, Geert Uytterhoeven wrote:

> This still heavily depends on the processing time spent in
> NCR5380_read(). You should never use a value derived from
> loops_per_jiffy for a non-empty loop,

Sure but the time-out condition isn't supposed to be precise.
Plus/minus a jiffy is no problem. Plus/minus a second is no good.

> as it may take much longer. Always compare with an maximum end time
> instead.

That can't work with interrupts disabled, which was the problem I was
trying to solve by use of loops_per_jiffy.

NCR5380_poll_politely() in mainline doesn't work with interrupts disabled
either, hence patch 21.

>
> E.g.:
>
> end = jiffies + 2; /* 1 jiffie + 1 safeguard */
> do {
> if ((NCR5380_read(reg1) & bit1) == val1)
> return 0;
> cpu_relax();
> } while (time_before(jiffies, end);
>
> And a similar loop for "Busy-wait for up to 20 ms".

Interrupts may be disabled during that loop also. Please refer to (and/or
respond to) patch 21, "ncr5380: Sleep when polling, if possible", in which
these changes were made.

--

>
> Gr{oetje,eeting}s,
>
> Geert
>
--

Geert Uytterhoeven

unread,

Nov 29, 2015, 8:51:11 AM11/29/15

to Finn Thain, Ondrej Zary, James E.J. Bottomley, Michael Schmitz, Linux/m68k, scsi, linux-...@vger.kernel.org

Hi Finn,

On Sun, Nov 29, 2015 at 11:25 AM, Finn Thain <fth...@telegraphics.com.au> wrote:
> On Sun, 29 Nov 2015, Geert Uytterhoeven wrote:
>> This still heavily depends on the processing time spent in
>> NCR5380_read(). You should never use a value derived from
>> loops_per_jiffy for a non-empty loop,
>
> Sure but the time-out condition isn't supposed to be precise.
> Plus/minus a jiffy is no problem. Plus/minus a second is no good.

If an ISA access takes 8 µs, while the CPU runs at 1 GHz, i.e. 500M loops/s,
the difference will be huge.

>> as it may take much longer. Always compare with an maximum end time
>> instead.
>
> That can't work with interrupts disabled, which was the problem I was
> trying to solve by use of loops_per_jiffy.

Then you indeed can't use jiffies.
Perhaps you can calibrate an NCR5380_read() loop at driver init time,
and use the calibration value later?

> NCR5380_poll_politely() in mainline doesn't work with interrupts disabled
> either, hence patch 21.
>
>> E.g.:
>>
>> end = jiffies + 2; /* 1 jiffie + 1 safeguard */
>> do {
>> if ((NCR5380_read(reg1) & bit1) == val1)
>> return 0;
>> cpu_relax();
>> } while (time_before(jiffies, end);
>>
>> And a similar loop for "Busy-wait for up to 20 ms".
>
> Interrupts may be disabled during that loop also. Please refer to (and/or
> respond to) patch 21, "ncr5380: Sleep when polling, if possible", in which
> these changes were made.

So the above loop may never terminate. Oops...

Gr{oetje,eeting}s,

Geert

--

Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Finn Thain

unread,

Nov 29, 2015, 9:13:01 PM11/29/15

to Geert Uytterhoeven, Ondrej Zary, James E.J. Bottomley, Michael Schmitz, linux-m68k, linux-scsi, linux-kernel

On Sun, 29 Nov 2015, Geert Uytterhoeven wrote:

> If an ISA access takes 8 us, while the CPU runs at 1 GHz, i.e. 500M

> loops/s, the difference will be huge.

Ondrej showed that an ISA access can take about 1.6 us. I don't know what
to make of the "8 uS" comment in the mainline driver. Maybe it was an even
slower ISA card.

Anyway, I made a measurement on my hardware and confirmed that lpj is a
very bad proxy for device register access throughput. The "loops per
access" gap is several orders of magnitude:

lpj HZ access time (us) lpa
------------------------------------------------------------------------
Ondrej's 5380 ISA card (PC): 4797252 250 1.6 1.9k
My DMX3191D PCI card (PowerMac): 167079 100 0.42 7.0

>
> Perhaps you can calibrate an NCR5380_read() loop at driver init time,
> and use the calibration value later?

I had the same idea but I didn't think that the complexity was justified
by the low precision requirement. But now that I have some timings I have
to agree.

--